Malware developers often leave unintentional hints about their development practices, goals, and identities in the executables they publish. These breadcrumbs can appear in a variety of locations ranging from the code they use (or reuse) to the metadata of what they publish. Examining executable’s can provide insight into a samples family of malware, its origins, and potentially the entity behind it. After the publication of FireEye’s recent blog series on Debug Details – we were inspired to take a closer look at what can be learned from one specific executable breadcrumb called Program Database (“PDB”) paths. Specifically, we wanted to explore methods for a scalable approach to PDB path analysis.
To do any type of analysis on PDB paths we needed a method to quickly extract any PDB path details. Our research led to the creation of PDBlaster, an open source tool for quickly bulk processing Portable Executable (“PE”) files, which we have made available on the PDBlaster GitHub.
Some background on PDB Paths
During the compilation of PE files, a Program Database (“PDB”) file may be generated depending on the project debugging settings. These files help developers debug their programs and typically store information called symbols. Symbols include information intended to make debugging easier such as information about global and local variables, and function names and their associated entry points. By default, this PDB file is created in the same directory the PE was compiled in. Additionally, embedded within the PE file is the location where its associated PDB file is stored. For example, if a program called RAT.exe was compiled in the following location: C:\Users\Nick\Programs\BadGuyStuff\RemoteAccess\v2\x86\, then that directory would contain the files RAT.exe and RAT.pdb. The PDB location would be embedded within RAT.exe, which can be extracted using a tool like pestudio as shown below:
When applied to malware, these file paths have been used to aid in threat intelligence analysis and creation – helping analysts build relationships between malware samples, and shed light on the author(s) working style/environment as demonstrated in the FireEye blog series linked above. The PDB path itself is not necessarily a valuable source of attribution as it can easily be manipulated or removed. However, in some cases if the malware developer is compiling their malware within their user profile (C:\Users\Nick\…) they may leak some information about who they are within the PDB path of their malware.
It is not uncommon for defenders to adopt the fallacy that attackers are OPSEC gurus, who leave few, if any, traces of their activity. Malware authors are not immune to the occasional slip-up, like re-using their username across multiple malware variants, social media platforms and forums like Stack Overflow.
Extracting PDB Paths at Scale:
A large percentage of malware samples do not have PDB paths embedded within them. As an example, here are some figures from theZoo which contain PDB paths:
In order to gather any meaningful amount of PDB paths and corresponding usernames, a large number of samples need to be analyzed. To surgically extract PDB paths, we made use of a Python library for parsing PE files called pefile. To speed up the process, we want to take advantage of pefile’s fast_load option which stops pefile from automatically parsing the entire executable which can take several seconds per file. For extracting PDB paths, we’re only interested in examining the debug directory which is where the PDB path is stored. Pefile makes this easy to do with the parse_debug_directory function. To use the parse_debug_directory function we need to supply the Relative Virtual Address (RVA) and Size of the debug directory contained within the PE.
The PE schema dictates that the 6th directory is always the debug directory (more information on the PE file format is available here). Knowing this, we can make the assumption that the virtual address and size of the 6th directory should correspond to the values we need to pass in to the parse_debug_directory function. When we supply the values to parse_debug_directory, pefile parses only the contents of the debug directory, and not the entire executable which saves us some processing time when working with a large numbers of files. Once the debug directory has been parsed, we can loop through each entry in the directory for a value called “PdbFileName”. As long as the value is not null, the PdbFileName value will contain the extracted PDB path, assuming the PE file contains that information.
Using pefile we can extract the PDB paths from approximately 70,000 files in less than a minute.
Below is a manual example of how this is achieved using the pefile library:
1. Fastload the PE using pefile
2. Extract the offset of the RVA for the debugging directory
3. Extract the size of the debugging directory
4. Parse the debug directory using the RVA and size values from above
5. Extract the PDB file path using the PdbFileName value from pefile
Using the steps above, we created a wrapper for pefile called PDBlaster which automates the extraction of PDB information. We also added a few features to gather some information we found interesting. The list below is an overview of the features we included within PDBlaster:
- Extract usernames from PDB paths
- Identify which PE’s share the same PDB path or username
- Run discovered usernames through Sherlock, a tool for performing a lookup to see if a username is in use on a large number of online sites
- Output scan results to a CSV
Below is an example output from running PDBlaster against theZoo:
While running PDBlaster against several sample sets of malicious executables, we made some observations:
- Malware samples attributed to separate actors which contained the same, or similar, PDB paths – implying a possible connection which could be further explored.
- Active online handles, and a few full names, of malware authors whose samples have been recently submitted to community malware repositories.
- Multiple commodity malware variants with a shared PDB path, which may indicate one developer working on multiple projects.
These findings were likely the tip of the iceberg, as they were run over a small sample set of roughly 220,000 files. Tracking information like PDB paths may prove to be valuable for organizations who find themselves targeted by attackers who maintain and update their own tools. Monitoring information like PDB paths is an interesting data point which can be used by intel teams to link malicious activity from the same actor across multiple campaigns.