Aakanksha Saha


Aakanksha Saha

I joined TU Wien as a University Researcher in 2021. My research interests include the following areas:

  • Malware Analysis
  • Reverse Engineering
  • Phishing
  • Secrets in Source Code

Currently, I am researching the applicability of robust machine learning models on sophisticated malware attacks.

Prior to this I was working as a Security Software Engineer at Microsoft, Redmond USA. I graduated with a Masters in Computer Science from University of Utah, USA in 2019. You can find my Masters thesis at Secrets in Source Code

  • PreDoc Researcher
Publications (created while at TU Wien)
    • Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps
      Jungwirth, G., Saha, A., Schröder, M., Fiebig, T., Lindorfer, M., & Cito, J. (2023). Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps. In IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) (pp. 322–333).
      DOI: 10.1109/MSR59073.2023.00051 Metadata
      Personal software configurations, known as dotfiles, are increasingly being shared in public repositories. To understand the security and privacy implications of this phenomenon, we conducted a large-scale analysis of dotfiles repositories on GitHub. Furthermore, we surveyed repository owners to understand their motivations for sharing dotfiles, and their awareness of the security implications. Our mixed-method approach consisted of two parts: (1) We mined 124,230 public dotfiles repositories and inductively searched them for security and privacy flaws. (2) We then conducted a survey of repository owners (n=1,650) to disclose our findings and learn more about the problems and implications. We found that 73.6 % of repositories leak potentially sensitive information, most commonly email addresses (of which we found 1.2 million), but also RSA private keys, API keys, installed software versions, browsing history, and even mail client inboxes. In addition, we found that sharing is mainly ideological (an end in itself) and to show off ("ricing"), in addition to easing machine setup. Most users are confident about the contents of their files and claim to understand the security implications. In response to our disclosures, a small minority (2.2%) will make their repositories private or delete them, but the majority of respondents will continue sharing their dotfiles after taking appropriate actions. Dotfiles repositories are a great tool for developers to share knowledge and communicate - if done correctly. We provide recommendations for users and platforms to make them more secure. Specifically, tools should be used to manage dotfiles. In addition, platforms should work on more sophisticated tests, to find weaknesses automatically and inform the users or control the damage.
    • Secrets in Source Code: Reducing False Positives using Machine Learning
      Saha, A., Denning, T., Srikumar, V., & Kasera, S. K. (2020). Secrets in Source Code: Reducing False Positives using Machine Learning. In 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE Xplore Digital Library.
      DOI: 10.1109/comsnets48256.2020.9027350 Metadata ⯈Fulltext (preprint)
      Private and public git repositories often contain unintentional sensitive information in the source code. Many tools have been developed to scan repositories looking for potential secrets and credentials committed in the code base, inadvertently or intentionally, for taking corrective action once these secrets and credentials are found. However, most of these existing works either target a specific type of secret or generate a large number of false positives. Our research aims to create a generalized framework to detect all kinds of secrets - which includes API keys, asymmetric private keys, client secrets, generic passwords - using an extensive regular expression list. We then apply machine learning models to intelligently distinguish between a real secret from a false positive. The combination of regular expression based approach and machine learning allows for the identification of different types of secrets, specifically generic passwords which are ignored by existing works, and subsequent reduction of possible false positives. We also evaluate our machine learning model using a precision-recall curve that can be used by an operator to find the optimal trade-off between the number of false positives and false negatives depending on their specific application. Using a Voting Classifier (combination of Logistic Regression, Naïve Bayes and SVM) we are able to reduce the number of false positives considerably.