Security & Privacy, TU Wien ‒

I joined TU Wien as a University Researcher in 2021. My research interests include the following areas:

Malware Analysis
Reverse Engineering
Phishing
Secrets in Source Code

Currently, I am researching the applicability of robust machine learning models on sophisticated malware attacks.

Prior to this I was working as a Security Software Engineer at Microsoft, Redmond USA. I graduated with a Masters in Computer Science from University of Utah, USA in 2019. You can find my Masters thesis at Secrets in Source Code

Roles

PreDoc Researcher

Contact

Publications (created while at TU Wien)

2024

Exploring the Malicious Document Threat Landscape: Towards a Systematic Approach to Detection and Analysis
Saha, A., Blasco Alís, J., & Lindorfer, M. (2024). Exploring the Malicious Document Threat Landscape: Towards a Systematic Approach to Detection and Analysis. In 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (pp. 533–544).
DOI: 10.1109/EuroSPW61312.2024.00065 Metadata
Abstract
Despite being the most common initial attack vector, document-based malware delivery remains understudied compared to research on malicious executables. This limits our understanding of how attackers leverage document file formats and exploit their functionalities for malicious purposes. In this paper, we perform a measurement study that leverages existing tools and techniques to detect, extract, and analyze malicious Office documents. We collect a substantial dataset of 9,086 malicious samples and reveal a critical gap in the understanding of how attackers utilize these documents. Our in-depth analysis highlights emerging tactics used in both targeted and large-scale cyberattacks while identifying weaknesses in common document analysis methods. Through a combination of analysis techniques, we gain crucial in-sights valuable for forensic analysts to assess suspicious files, pinpoint infection origins, and ultimately contribute to the development of more robust detection models. We make our dataset and source code available to the academic community to foster further research in this area.
ADAPT it! Automating APT Campaign and Group Attribution by Leveraging and Linking Heterogeneous Files
Saha, A., Blasco, J., Cavallaro, L., & Lindorfer, M. (2024). ADAPT it! Automating APT Campaign and Group Attribution by Leveraging and Linking Heterogeneous Files. In RAID ’24: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (pp. 114–129). Association for Computing Machinery.
DOI: 10.1145/3678890.3678909 Metadata
Abstract
Recent years have witnessed a surge in the growth of Advanced Persistent Threats (APTs), with significant challenges to the security landscape, affecting industry, governance, and democracy. The ever- growing number of actors and the complexity of their campaigns have made it difficult for defenders to track and attribute these malicious activities effectively. Traditionally, researchers relied on threat intelligence to track APTs. However, this often led to fragmented information, delays in connecting campaigns with specific threat groups, and misattribution. In response to these challenges, we introduce ADAPT, a ma- chine learning-based approach for automatically attributing APTs at two levels: (1) the threat campaign level, to identify samples with similar objectives and (2) the threat group level, to identify samples operated by the same entity. ADAPT supports a variety of heterogeneous file types targeting different platforms, includ- ing executables and documents, and uses linking features to find connections between them. We evaluate ADAPT on a reference dataset from MITRE as well as a comprehensive, label-standardized dataset of 6,134 APT samples belonging to 92 threat groups. Using real-world case studies, we demonstrate that ADAPT effectively identifies clusters representing threat campaigns and associates them with their respective groups.

2023

Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps
Jungwirth, G., Saha, A., Schröder, M., Fiebig, T., Lindorfer, M., & Cito, J. (2023). Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps. In IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) (pp. 322–333).
DOI: 10.1109/MSR59073.2023.00051 Metadata
Abstract
Personal software configurations, known as dotfiles, are increasingly being shared in public repositories. To understand the security and privacy implications of this phenomenon, we conducted a large-scale analysis of dotfiles repositories on GitHub. Furthermore, we surveyed repository owners to understand their motivations for sharing dotfiles, and their awareness of the security implications. Our mixed-method approach consisted of two parts: (1) We mined 124,230 public dotfiles repositories and inductively searched them for security and privacy flaws. (2) We then conducted a survey of repository owners (n=1,650) to disclose our findings and learn more about the problems and implications. We found that 73.6 % of repositories leak potentially sensitive information, most commonly email addresses (of which we found 1.2 million), but also RSA private keys, API keys, installed software versions, browsing history, and even mail client inboxes. In addition, we found that sharing is mainly ideological (an end in itself) and to show off ("ricing"), in addition to easing machine setup. Most users are confident about the contents of their files and claim to understand the security implications. In response to our disclosures, a small minority (2.2%) will make their repositories private or delete them, but the majority of respondents will continue sharing their dotfiles after taking appropriate actions. Dotfiles repositories are a great tool for developers to share knowledge and communicate - if done correctly. We provide recommendations for users and platforms to make them more secure. Specifically, tools should be used to manage dotfiles. In addition, platforms should work on more sophisticated tests, to find weaknesses automatically and inform the users or control the damage.

2020

Secrets in Source Code: Reducing False Positives using Machine Learning
Saha, A., Denning, T., Srikumar, V., & Kasera, S. K. (2020). Secrets in Source Code: Reducing False Positives using Machine Learning. In 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE Xplore Digital Library.
DOI: 10.1109/comsnets48256.2020.9027350 Metadata ⯈Fulltext (preprint)
Abstract
Private and public git repositories often contain unintentional sensitive information in the source code. Many tools have been developed to scan repositories looking for potential secrets and credentials committed in the code base, inadvertently or intentionally, for taking corrective action once these secrets and credentials are found. However, most of these existing works either target a specific type of secret or generate a large number of false positives. Our research aims to create a generalized framework to detect all kinds of secrets - which includes API keys, asymmetric private keys, client secrets, generic passwords - using an extensive regular expression list. We then apply machine learning models to intelligently distinguish between a real secret from a false positive. The combination of regular expression based approach and machine learning allows for the identification of different types of secrets, specifically generic passwords which are ignored by existing works, and subsequent reduction of possible false positives. We also evaluate our machine learning model using a precision-recall curve that can be used by an operator to find the optimal trade-off between the number of false positives and false negatives depending on their specific application. Using a Voting Classifier (combination of Logistic Regression, Naïve Bayes and SVM) we are able to reduce the number of false positives considerably.

Aakanksha Saha