Monday, August 15, 2016

Forensics Quickie: PowerShell Versions and the Registry

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

I was chatting with Jared Atkinson and James Habben about PowerShell today and a question emerged from the discussion: is there way to determine the version of PowerShell installed on a given machine without using the $PSVersionTable PowerShell command? We all agreed that it would be nice to have an offline source for finding this information.


You want to determine the version of PowerShell installed on a machine, but don't have a means by which to run the $PSVersionTable PowerShell command (e.g. you are working off of a forensic image -- not a live machine).

The Solution

Right off the bat, Jared suggested that there had to be something in the registry related to this information and subsequently pointed us to the following registry key: HKLM\SOFTWARE\Microsoft\PowerShell. James noted that he found a subkey named "1" inside. Within the "1" subkey is yet another subkey named PowerShellEngine. As we can see in the screenshot below, there is a value named PowerShellVersion that will tell us the version of PowerShell installed on the machine.

 Note that PowerShell version 2.0 is shown in this registry key

There was a nuance, however. While James was only seeing one subkey (with the name "1"), I was seeing another subkey in addition to "1." I also saw a subkey named "3" on my machine. I took a look to find the following:

 A second subkey named "3" shows a different, more recent version of PowerShell

We wondered what this could mean. It wasn't until Jared noted that having the "1" subkey would indicate the existence of PowerShell v1 or v2 and that having the "3" subkey would indicate PowerShell v3-5 that this all started to make more sense.

James's machine was a Windows XP workstation. My machines were Windows 10 workstations. Therefore, James's SOFTWARE hive only had a single "1" subkey. It only had PowerShell v2 on it. But why did the Windows 10 workstations have both a "1" subkey and a "3" subkey? Jared, once again, suggested that a previous version of Windows being upgraded to Windows 10 may have been the reason. Sure enough, I had upgraded my Windows 7 machines to Windows 10 and had NOT done a fresh Windows 10 install. Note that this may not be the reason for seeing both subkeys; I reviewed a machine with a fresh Windows 10 install and observed that it also had both subkeys.

The bottom line is that, yes, the version of PowerShell can be found in the registry and not just by running the $PSVersionTable PowerShell command. But keep in mind that you might find more than one registry key containing PowerShell version information.
Note: Beware the PowerShell.exe Location

Do not be fooled by the default location of PowerShell.exe. The executable's path will show %SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe. Unless manually changed, this path will show "v1.0" regardless of the PowerShell versions installed on the machine.


Great! We solved our problem. But what about some of this other stuff we see in the PowerShellEngine subkey? What's that RuntimeVersion value and why doesn't it match the PowerShellVersion value? If two PowerShell engines exist on the Windows 10 machines, how do I use the older, v2 engine instead of the v5 engine?

To answer these questions, let's first use the easiest way possible to determine the version of PowerShell installed on a machine: the $PSVersionTable PowerShell command. (I ran everything below on the Windows 10 machine).

PS C:\Users\4n6k> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.0.10240.16384
WSManStackVersion              3.0
CLRVersion                     4.0.30319.42000
BuildVersion                   10.0.10240.16384
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3

First, I looked to see if there was an easier way to figure out what all of this output meant. And, what do you know, a quick Google search and ServerFault answer were able to point me in the right direction. Instead of looking at the help files in a PowerShell session, I just looked up what I needed online here. We come back with this:
  • CLRVersion: 
    • The version of the common language runtime (CLR).
  • BuildVersion: 
    • The build number of the current version.
  • PSVersion: 
    • The Windows PowerShell version number.
  • WSManStackVersion: 
    • The version number of the WS-Management stack.
  • PSCompatibleVersions: 
    • Versions of Windows PowerShell that are compatible with the current version.
  • SerializationVersion:  
    • The version of the serialization method.
  • PSRemotingProtocolVersion: 
    • The version of the Windows PowerShell remote management protocol.
And there you have it. Full explanations of what we're looking at here.
Note: CLRVersion & RuntimeVersion

Notice that when we run the $PSVersionTable command, we see a line named CLRVersion. The value associated with this name is the same as the value that we see when we look in the registry at the RuntimeVersion value. This is because both of these entries are related to the "Common Language Runtime (CLR)" used in the .NET Framework. You can read more about that here. Since I'm using Windows 10, I have .NET 4.6, which uses CLR version 4.0.30319.42000.

So, what about the two PowerShell engines that exist on my Windows 10 machines? What if I want to use a different engine than v5? Well, it's as easy as running a PowerShell command. To quote this MSDN article:

"When you start Windows PowerShell, the newest version starts by default. To start Windows PowerShell with the Windows PowerShell 2.0 Engine, use the Version parameter of PowerShell.exe. You can run the command at any command prompt, including Windows PowerShell and Cmd.exe.

PowerShell.exe -Version 2

Let's give it a shot.

PS C:\Users\4n6k> $psversiontable
Name                           Value
----                           -----
PSVersion                      5.0.10240.16384
WSManStackVersion              3.0
CLRVersion                     4.0.30319.42000
BuildVersion                   10.0.10240.16384
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3

PS C:\Users\4n6k> PowerShell.exe -Version 2
Windows PowerShell
Copyright (C) 2009 Microsoft Corporation. All rights reserved.

PS C:\Users\4n6k> $psversiontable
Name                           Value
----                           -----
CLRVersion                     2.0.50727.8669
BuildVersion                   6.1.7600.16385
PSVersion                      2.0
WSManStackVersion              2.0
PSCompatibleVersions           {1.0, 2.0}
PSRemotingProtocolVersion      2.1

As you can see, our PowerShell session is now using the v2 engine instead of v5. Note that when I tried PowerShell.exe -Version 3, the output I received was the same output I received for v5. This may be due to jumping from PowerShell v2 on Windows 7 to PowerShell v5 on Windows 10. This could also be because of the split between v1/v2 and v3/v4/v5 (thanks to James and Jared for this possible explanation).

A big thanks goes out to Jared Atkinson and James Habben. This post wouldn't exist without their involvement and discussion.

-Dan Pullega (@4n6k)

1. What do the contents of PowerShell's $PSVersionTable represent? (ServerFault)
2. Common Language Runtime (CLR)
3. MSDN: about_Automatic_Variables - PowerShell
4. Environment.Version Property (.NET)
5. Starting the Windows PowerShell 2.0 Engine

Wednesday, March 16, 2016

Jump List Forensics: AppID Master List (400+ AppIDs)

TL;DR: The list of 400+ manually generated Jump List application IDs can be found HERE.

About 5 years ago, I wrote two blog posts related to Windows Jump Lists [1] [2]. These two posts covered jump list basics and focused mainly on how each application that is run on a Windows machine has the potential to generate a %uniqueAppID%.automaticDestinations-ms file in the C:\Users\%user%\AppData\Roaming\Microsoft\Windows\Recent\AutomaticDestinations\ directory. The AppID lists I created in 2011 have been useful to me in the past, so I decided to expand them. Many tools use these lists, as well. With that, I've recently added over 100 more unique application AppIDs and combined them into one list.

As a refresher, each application (depending on the executable's path and filename) will have a unique Application ID (i.e. AppID) that will be included in the name of the .automaticDestinations-ms jump list file. Jump lists provide additional avenues in determining which applications were run on the system, the paths from which they were run, and the files recently used by them.

The catch is that you need to know which AppIDs will be generated for certain applications. And, at this point in the game, the only way to know that is to either (a) manually generate the .automaticDestinations-ms files or (b) know the executable's absolute path and use Hexacorn's AppID Calculator. Either way, you need to have some kind of starting information to come back with an answer.

As we already know, two ways in which the .automaticDestinations-ms files are generated are:


Both of these methods will show you the application's jump list, thereby generating/modifying the application's .automaticDestinations-ms file. In this case, that file is named:


...with 9b9cdc69c1c24e2b being the AppID for 64-bit Notepad.

In the AppID list, you will notice a few entries containing multiple versions of applications. Many of these applications retain their default installation location as they are updated to new versions. This essentially means that the AppID will stay the same. As an example, if we take a look at iTunes, we'll see that iTunes has an AppID of 83b03b46dcd30a0e; I tested and verified that in 2011. If we take a look at a more recent version (, we can see that the AppID has remained the same. This is because when the newer version is installed (and then run), it is doing so from the same location as the old version was, which causes the AppID to remain the same among different versions. If you want to learn more about how the AppID is actually generated, I highly recommend that you read through Hexacorn's blog post here.

With that, you can find the AppID master list at the following location:

Note that with the release of Eric Zimmerman's JLECmd (Jump List Explorer Command Line), an investigator can gain better insight into the applications for which the jump list files were generated.

As Eric explains in his Jump Lists In-Depth post, jump lists are (more or less) collections of LNK files. So, for example, if you have a jump list .automaticDestinations-ms file that has an unknown AppID and you see that the LNK files contained within it all point to a specific file type (say, AutoCAD .dwg drawing files), you might be able to conclude that the jumplist belongs to an AutoCAD-related program. Obviously, this is a very simple example, but you get the idea. You have more information to work with now.

The AppID master list is a work in progress and will likely be updated occasionally throughout its life cycle.

-Dan (@4n6k)

1. Jump Lists In Depth (by Eric Zimmerman)
2. Introducing JLECmd! (by Eric Zimmerman)
3. JumpLists file names and AppID calculator (by Hexacorn)
4. Jump List Forensics: AppIDs Part 1 (by 4n6k)
5. Jump List Forensics: AppIDs Part 2 (by 4n6k)

Saturday, May 23, 2015

Forensics Quickie: NTUSER.DAT Analysis (SANS CEIC 2015 Challenge #1 Write-Up)

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

SANS posted a quick challenge at CEIC this year. I had some downtime before the conference, so I decided to take part. In short, SANS provided an NTUSER.DAT hive and asked three questions about it. Below is a look at my process for answering these questions and ultimately solving the challenge. It's time to once again refresh our memories with the raw basics.

The Questions

Given an NTUSER.DAT hive [download], the questions were as follows:
  1. What was the most recent keyword that the user vibranium searched using Windows Search for on the nromanoff system?
  2. How many times did the vibranium account run excel.exe on the nromanoff system?
  3. What is the most recent Typed URL in the vibranium NTUSER.DAT?

The Answers

Right off the bat, we can see that these questions are pretty standard when it comes to registry analysis. Let's start with the first question.

Question #1: Find the most recent keyword searched using Windows Search.

First, we must understand what the question is asking. "Windows Search" refers to searches run using the following search fields within Windows:

Windows Search via the Start Menu.


Windows Search via Explorer.

The history of terms searched using Windows Search can be found in the following registry key:


For manual registry hive analysis, I use Eric Zimmerman's Registry Explorer. Once I open up the program and drag/drop the NTUSER.DAT onto it, I typically click the root hive (in the left sidebar) and just start typing whichever key I'd like to analyze. In this case, I started to type "wordwheel" and Registry Explorer quickly jumped to the registry key in question. Note that you can also use the "available bookmarks" tab in the top left to find a listing of some common artifacts within the loaded hive (pretty neat feature; try it out).

Registry Explorer displaying the WordWheelQuery regkey (MRUListEx value selected).

In the screenshot above, notice that the MRUListEx value is highlighted (Sound familiar? We also saw use of the MRUListEx value upon analyzing shellbags and recentdocs artifacts). This value will show us the order in which the Windows Search terms were searched. The first entry in the MRUListEx value is "01 00 00 00." This means that the registry value that is marked as "1" is the most recently searched item. If we analyze the MRUListEx value further, we notice that the next entry is "05 00 00 00," indicating that the value marked as "5" is the term that was searched before the most recently searched item marked as "1." But we're only concerned with the most recently searched term, so let's look at what the value marked as "1" contains:

Registry Explorer displaying the WordWheelQuery regkey (value "1" selected).

We note that the Unicode representation of the hex values is "alloy." And just like that, we have our answer to question #1. The most recent Windows Search term is "alloy."
Note: MRUListEx Item Entries

Each entry in the MRUListEx value will be 4 bytes in length stored in little endian. That is, each entry is going to be a 32-bit integer with the least significant byte stored at the beginning of the entry. E.g. an entry for "7" would be shown as "07 00 00 00."

Question #2: Find the number of times excel.exe was run.

For question #2, we are concerned with program execution. And, as we already know, there is no shortage of artifacts that can be used to determine this (.lnk files, Windows Error Reporting crash logs, Prefetch, AppCompatCache, etc.). However, we are limited to only the NTUSER.DAT hive for this challenge. As such, the artifact we will want to look at will be UserAssist.

Remember that unlike Prefetch, UserAssist artifacts will show us run counts per user instead of globally per system. Since we would like to determine how many times excel.exe has been run by a specific user, UserAssist is the perfect candidate.

UserAssist artifacts can be found in the following registry key:


Just as with question #1, let's open up Registry Explorer and start typing "userassist" to quickly find our way to this key.

Registry Explorer displaying the UserAssist regkey. ROT13'd EXCEL.EXE, run counter, and last run time highlighted. 

Within the UserAssist key, there will be two subkeys that each contain a "Count" subkey. For this challenge, we will be looking in the {CEBFF5CD-ACE2-4F4F-9178-9926F41749EA} subkey. Each value's name within the "Count" subkey is ROT13 encoded, so let's decode the value for Excel.

{7P5N40RS-N0SO-4OSP-874N-P0S2R0O9SN8R}\Zvpebfbsg Bssvpr\Bssvpr14\RKPRY.RKR

  ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕     ↕

{7C5A40EF-A0FB-4BFC-874A-C0F2E0B9FA8E}\Microsoft Office\Office14\EXCEL.EXE

The first part of the decoded path (in bold) is the Windows KnownFolderID GUID that translates to the %ProgramFiles% (x86) location.

We've now pinpointed the application in question. Next, we need to find out how many times it has been run.

The run counter for the EXCEL.EXE UserAssist entry can be found at offset 0x04. In this instance, the run counter happens to be 4, indicating that EXCEL.EXE was run four times. Note that on an XP machine, this counter would start at 5 instead of 1. So if you are manually parsing this data, remember to subtract 5 from the run counter when dealing with XP machines.

We've successfully answered question #2. EXCEL.EXE was run 4 times.

But wait, there's more! Check the 8-byte value starting at offset 0x3C (60d). That's the last time the program was run. Convert this 64-bit FILETIME value to a readable date/time using DCode.

DCode showing the decoded last run time of EXCEL.EXE

EXCEL.EXE was last run Wed, 04 April 2012 15:43:14 UTC. On to question #3.

Note: Determining OS version with NTUSER.DAT only

As a side note, we can now tell that the machine housing this NTUSER.DAT was a post XP/Server 2003 machine. How? Well, there are a few indicators: UserAssist entries on Windows XP are 16 bytes in length while Windows 7 UserAssist entries are 72 bytes in length; the two subkeys under the root UserAssist key ({CEBFF5CD-ACE2-4F4F-9178-9926F41749EA} and {F4E57C4B-2036-45F0-A9AB-443BCFE33D9F}) are typically found on machines running Windows 7; we see references to the "C:\Users" folder in other UserAssist entries instead of the "Documents and Settings" folder that is typically found on XP machines; the run counter for EXCEL.EXE is less than 5 -- on an XP machine, this counter would be at LEAST 5.

Question #3: Find the most recent TypedURL.

The TypedURLs registry key stores URLs that are manually typed/pasted into Internet Explorer. Clicked links are not stored here.

Using our tried and true Registry Explorer process, let's look at what the TypedURLs registry key has to offer. Navigate to the NTUSER.DAT\Software\Microsoft\Internet Explorer\TypedURLs key by typing "typedurls" in Registry Explorer.

Registry Explorer showing the TypedURLs regkey.

As we can see above, the most recent TypedURL is The value labeled "url1" will always be the most recent TypedURL. Also, check the LastWrite time on the root TypedURL key to determine the last time "url1" was typed.

Question #3: answered. Challenge complete.


Now, let's assume you've actually got some work to finish. You've gone through this stuff manually at least once and understand it inside and out. It's time to automate.

Harlan Carvey's RegRipper has plugins to quickly pull/parse the registry keys covered here and much, much more. Yes, we can answer all three of these questions with a one-liner.

C:\RegRipper2.8>rip.exe -r Vibranium-NTUSER.DAT -p wordwheelquery >> ntuser.log && rip.exe -r Vibranium-NTUSER.DAT -p userassist >> ntuser.log && rip.exe -r Vibranium-NTUSER.DAT -p typedurls >> ntuser.log

Remember, though: it's one thing to be able to run RegRipper. It's another to know where the output is coming from and why you're seeing what you're seeing.

Again, this is nothing new; this challenge is actually on the easier side of analysis. But, if at any point you had doubts about the artifacts covered here, it's worth going back and refreshing your memory.

-Dan Pullega (@4n6k)

1. UserAssist Forensics (by 4n6k)
2. INSECURE Magazine #10 (by Didier Stevens)
3. ROT13 is used in Windows? You’re joking! (by Didier Stevens)
4. KNOWNFOLDERID (by Microsoft)
5. FILETIME structure (by Microsoft)

Tuesday, August 26, 2014

Forensic FOSS: - Install Volatility For Linux Automatically

Introducing FORENSIC FOSS! These posts will consist of open source software for use in everyday forensic investigations.

[UPDATE #01 11/12/2015]: Volatility 2.5 was released recently. A standalone Linux executable is included with the 2.5 release. This installer is for Volatility 2.4. If you want to work with source code and get an idea of the dependencies needed by Volatility, review this installer's code. Otherwise, I'd recommend using the official Linux standalone executable. If you really want a working 2.5 installer update, see the fork of this project by @wzod).

What Is It? is a bash script that installs Volatility 2.4 (and all dependencies) for Ubuntu Linux with one command.

Why Do I Need It?
Compiling source on Linux can be a pain. Dependency hunting wastes time and drives people away from considering Linux builds of cross-platform software. With this script, you can (1) skip all of the dependency frustration, (2) get right into using the newest version of Volatility, and (3) leverage the arguably more powerful and versatile *nix shell. No longer do you have to worry about whether or not you "have everything."

What's Required?
An internet connection and an APT-based Linux distribution [for the time being]. This script has been tested on stock Ubuntu 12.04 and Ubuntu 14.04. Some testing has been done to support SIFT Workstation 3.0.

What Does It Do?
Specifically, does the following:
  • Downloads, verifies, extracts, and installs source archives for everything you will need to complete a full installation of Volatility 2.4:
    • Volatility 2.4
    • diStorm3
    • Yara (+ magic module) + Yara-Python
    • PyCrypto
    • Python Imaging Library + Library Fixes
    • OpenPyxl
    • IPython
    • pytz
  •  Adds "" to your system PATH so that you can run Volatility from any location.
  • Checks to see if you are using SIFT 3.0 and applies some fixes. 
How Do I Use It?
Volatility will be installed to the directory you specify.
  • From a terminal, run: 
    • sudo bash /home/dan
In the above example, the following directories will be created:
  • /home/dan/volatility_setup 
    • Contains dependency source code and the install_log.txt file.
  • /home/dan/volatility_2.4
    • Contains the Volatility 2.4 install.
You must enter a full path for your install directory.
Where Can I Download It?
You can download the script from my Github page.

Check the Github page for the script's SHA256 hash.

Bottom Line?
Don't be afraid of the terminal. Read the source for this script and understand how it works. Automation is acceptable only after you understand what is happening behind the scenes.

I'm open to making this script better. If you see a problem with the code or can suggest improvements, let me know and I'll see what I can do.

Thanks to the Volatility team for writing an amazing tool. Go to for more info. Thanks to @The_IMOL, Joachim Metz, @dunit50, and @iMHLv2 for feedback.

Wednesday, April 16, 2014

Forensics Quickie: Merging VMDKs & Delta/Snapshot Files (2 Solutions)

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

I had a VM that was suspended. I needed to see the most recent version of the filesystem. Upon mounting the base .vmdk file, I was presented with the filesystem that existed before the snapshot was taken.

Solution #1
(turns out I ran into a similar problem before...see my post on Mounting Split VMDKs).

The issue lies within the fact that when you create a VM snapshot, "all writes to the original -flat.vmdk are halted and it becomes read-only; changes to the virtual disk are then written to these -delta files instead"[2]. So basically, when I was mounting the base .vmdk file (the "someVM-flat.vmdk"), I wasn't seeing anything that was written to the disk after the snapshot was created. I needed a way to merge the delta files into the -flat file.

To further explain what I was working with, I had three .vmdk files:
  1. Win7.vmdk (1KB in size; disk descriptor file; simply pointed to the -flat.vmdk)
  2. Win7-flat.vmdk (large file size; the base .vmdk file)
  3. Win7-000001.vmdk (delta/snapshot file; every write after the snapshot is stored here)
*Note that this image was not split. If it was, I would just use the method detailed in my other post

As mentioned, I needed to merge these all into one to be able to mount the .vmdk and see the most recent version of the filesystem. VMware includes a CLI tool for this. It is stored in the "Program Files\VMware" folder. Run this command.

vmware-vdiskmanager.exe –r Win7-000001.vmdk –t 0 singleFileResult.vmdk

Note that the .vmdk file being used as input should be the .vmdk for the latest snapshot. You can confirm which .vmdk file this is by checking the VM's settings.

You can also define what kind of disk you want to output, as well. I have never found it necessary to use anything other than 0.

You can now mount the new .vmdk to see the most recent version of the file system. I *imagine* you could do this for previous snapshots if you define the proper .vmdk. But I have not tested that.

Solution #2
The other solution, which I wound up using after finding out how to do it correctly, is to import the .vmdk files in order within X-Ways. If you try to import a delta file before the base .vmdk, X-Ways will throw an error saying:

"In order to read a differencing VMDK/VHD image, the corresponding parent must be (and stay) opened first. They should be opened in order of their last modified dates - oldest first, skipping none."

So, I did as it said; I imported Win7.vmdk, then Win7-000001.vmdk. It's that easy.

Though this method may become cumbersome with many snapshots/delta files, you would be able to incrementally see what writes had been made to each snapshot. Just be careful when adding delta files for snapshots that depend on previous snapshots (see below).

VMware's Snapshot Manager showing snapshot dependencies.

Thanks to Eric Zimmerman, Jimmy Weg, and Tom Yarrish for helping with the X-Ways method.

-Dan Pullega (@4n6k)

1. Consolidating snapshots in VMware Fusion
2. Understanding the files that make up a VMware virtual machine

Saturday, March 29, 2014

Forensics Quickie: Verifying Program Behavior Using Source Code

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

Recently, I stumbled upon a question related to the behavior of a given program. That program was Mozilla Firefox and the behavior in question was how profile directory names were generated. Through this post, I will cover how to approach this question and how to solve it with available resources (source code).

The Question
How are Firefox profile directory names generated?

The Answer (and the road to get there)
To answer this question, we first have to understand which artifacts we are examining. In this case, we are dealing with Firefox profiles. Those are located in a user's AppData folder. By navigating AppData, we eventually find C:\Users\someUser\AppData\Roaming\Mozilla\Firefox\Profiles.

The Firefox 'Profiles' folder showing the directory for the profile named "default."

We see a folder for the only Firefox profile used on the system. Reasonably, this account is named "default" by default. But what about that seemingly random prefix?

Let's see if we can glean anything from trying to create a profile in Firefox. First, let's open up the Profile Manager by typing "firefox.exe -p" in the 'Run...' dialog (Windows key + R).

We can confirm that there is only one profile and it is named "default."

When we try to create a profile, we see the following window:

Great. We can actually see where it is going to be saved. And no matter what we enter in the text input field, the prefix stays the same. This tells us that the folder for this new profile isn't generated based on the profile name we enter. But there are other possibilities, such as the folder name being based on the current time.

There are many other tests we could run, but we actually don't need to -- the source code for Firefox is freely available online. Once we download and extract the source code, we can try to find the function that handles the generation of the profile's folder name.

Uncompressed, the Firefox source code is about 585MB. That's a lot of code to review. A better way to sift through all of this data is to either index it all and search it or to just recursively grep the root folder. I decided on the latter.

To find out where to look first, we can try to find a constant, unique phrase around the text in question. In the above image, the string "Your user settings, preferences and other user-related data will be stored in:" is right before the path name with which we are concerned. So let's grep for that and see if we can find anything.

There are many ways to grep data, but this was a quick and dirty job where I wasn't doing any parsing or modifying, so I used AstroGrep. I went ahead and searched for the a watered down version of the aforementioned unique phrase: "will be stored in." The results showed that the file named CreateProfileWizard.dtd contained this unique phrase (there were many files that were responsive, but based on the phrase's context and the filename for the file in which the phrase was found, we can determine relevancy).

A snippet of the responsive "CreateProfileWizard.dtd"  file containing our unique phrase.

Now, it's just a matter of tracing this phrase back to its source. So we grab another unique phrase that is relevant to our discovery, such as "profileDirectoryExplanation," and see if we can find any more relevant files. Grepping for it comes back with more results -- one of which is createProfileWizard.xul. I didn't see much in this file at first glance (though there is quite bit of good info such as "<wizardpage id="createProfile" onpageshow="initSecondWizardPage();">" -- which will be seen soon), so I decided to see what else was in same directory as this file. There, I found createProfileWizard.js.

A snippet of the "createProfileWizard.js" file showing the functions used to generate the profile directory names.

Skimming the code, we can see that a salt table and Math.random() are used to determine the 8 character prefix. At line 68, we see the concatination of kSaltString ('0n38y60t' in the animated gif), a period ('.'), and aName (our text input field variable -- in our example, "abcdefg" or "4n6k").

In the end, the point is that having source code available makes research more streamlined. By looking at the Firefox source, we were able to determine that the profile directory names aren't based on a telling artifact.

Using what is available to you is invaluable. By having a program's source code [and approaching an issue with an analytic mindset], verifying program behavior is made so much easier.

I'll end with this: open source development is GOOD -- especially for forensics. Whether it's a tool meant to output forensic artifacts or just another Firefox, we are able to more confidently prove why we are seeing what we're seeing.

-Dan Pullega (@4n6k)

* Special thanks to Tom Yarrish (@CdtDelta) for spawning this post.

Monday, February 17, 2014

Forensics Quickie: PDF Metadata Forensics (Sunday Funday Answer)

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

David Cowen's Sunday Funday contests are great. In short, a question is posed about a subject, and readers get the opportunity to answer and explain how it is relevant in a forensic investigation. Much can be learned from these contests, as a great answer is almost always delivered and posted publicly. I highly recommend checking it out and submitting answers (it's a weekly thing, so the subjects vary widely).

This week's (2/16/14) subject was PDF Metadata. I took a stab at answering and wound up winning, so I figured it wouldn't hurt to post my answer here. The original answer, along with a preface by David Cowen, can be found here.

The Questions
The subject's prompt was as follows:

1. What metadata can be present within a PDF document?
2. What affects the types of metadata that will be present within a PDF document?

The Answer
The following is my answer to the prompt. Disclaimer: this was written in the wee-hours of one night, so if there are any typos or errors, please feel free to contact me.

Much can be gleaned from PDF metadata. Of course, there are the standard fields that will provide you with relatively common metadata...but depending on the program you use to create the PDF, there could be much, much more.

Let's start with the most common metadata. Most PDFs will have embedded metadata showing the PDF version, creation date, creation program, document ID, and last modified date. There are definitely more, but I have left them out as they wouldn't be very useful in an investigation (e.g. page count). The use for the aforementioned metadata is fairly obvious, but I will explain nonetheless.

The more obvious PDF metadata entries are:
  • Creation program: program used to create the PDF (was it through desktop software, was it scanned, etc.)
  • Creator/Author/Producer: Username or full name of the PDF's author OR further details on the program used to create the PDF (is it a previous employer?)
  • Title: the title of the PDF that usually provides an outdated name for the document; good for identifying previous employer documents or documents that have been converted from one format into a PDF (e.g. SecretBluePrint.eps or oldCompanyFinances.doc shows up in the 'title' metadata entry)
Those are the easy ones. But what about the more overlooked metadata? As I mentioned before, the program used to create or modify the PDF may have a huge impact on what information you are given. With that, let's look into it.

First, timestamps. We know that file metadata could potentially serve as a better indicator of when a given document was created. If the PDF has been transferred across various volumes and systems -- and we would like to find the origin of the document -- the creation date in the file metadata is going to be more reliable than the file system creation date (as the filesystem date/time will have been updated with the copies/moves).
A More Reliable Creation Date

The metadata 'creation' date will [usually] preserve the REAL date of the file's creation. That is, if the PDF has been transferred across various volumes and systems, the 'creation' date in the file's metadata is going to give us a better idea of when the document was initially created.

The 'modified' date can be used in a similar way. We might even be able to tell how many programs through which the PDF was modified/saved. Say we have a PDF created using Adobe InDesign. If we were to open this PDF, modify it, and then save it as a new file using 'Save As...' in a program like Adobe Acrobat, we would see that the 'creation' date is still unchanged, but the 'modified' date had been updated (file system creation dates will tell us differently). Pretty standard stuff. Even if the PDF is saved using 'Save As...' (essentially creating a new file altogether with an updated file system creation time) AND it is moved from one system to the next, we will still have a genuine 'creation' date. Not only that, but we will have a metadata 'modified' date AND a new file system creation time to work with. Correlation among file metadata and file system timestamps are beyond the scope of this answer, but you get the point; 'creation' and 'modified' metadata dates are powerful and can be used creatively.
Also, with many PDF timestamps, we will be able to see a timezone offset. For example, a creation timestamp could be 2013:02:22 11:21:34-06:00. We now have a potential indication that the program that produced this PDF was set in Mountain time.

I mentioned that we might be able to determine if a PDF was created and modified through more than one program. As a quick side note, and if we really wanted to dive into the PDF analysis, we could take a look at some of the other telling metadata. The example above suggested creating a PDF in InDesign, opening it up in Acrobat, modifying it, and then saving it as a new file. When this happens, some of the metadata in the new file (like the 'modified' time) is updated while all of the InDesign metadata stays intact. However, there is a significant difference this time around: the 'XMP Toolkit' metadata value is different. Adobe implements their XMP Toolkit in all of their applications and plugins. They even open sourced it, so other programs can use it (and many do). The point is, "the XMPCore component of this toolkit is what allows for the creation and modification of the metadata that follows the XMP Data Model" (more here and here). So we have two PDFs, but the metadata for each was manipulated by two different versions of Adobe's XMP Core.

InDesign used:
"Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27" and...

Acrobat used:
"Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03"

But why is this important? Well, we can now more accurately pinpoint the program used to create the PDF. Sure, we will likely already have a metadata entry that tells us the 'Creation Program,' but consider the above example; that tool (InDesign) may have been used to initially create the PDF, but it was NOT used to open, modify, and save a new version of it (Acrobat did that). Let's keep this in mind as we explain some other interesting metadata...

Remember: the amount of metadata that a program uses when creating files is limitless. XMP is built on XML, so any metadata tags can be defined. Let's take a real-world example of how powerful PDF metadata can be when created from certain programs. Download Trustwave's Global Security Report PDF from 2013. Run it in exiftool. What do you see? That's right, the "History" metadata fields will show you not only that the document was saved 497 times, but it will also show you the exact times that is was saved, the program used to save it each time, and the Document Instance ID for each save (less exciting).

While you have that open, take a look at the creation date (2013:02:22 11:21:34-06:00) and modify date (2013:05:09 10:47:39-07:00). The modify date is much later, but the last "History" save on the file was 2013:02:22 11:18:06-06:00. What's up with that? This is because the PDF was modified in a different program; one newer than InDesign CS5.5. How do I know this? Well, look at the XMP Core version. The XMP Core version used for InDesign CS5.5 is "Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03." I just so happen to have a PDF created with InDesign CS6 and that PDF uses "Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27." How can it be that CS5.5 is using a later XMP Core version than CS6?! Because another program was used to modify the CS5.5 PDF after the last save. On 2013:05:09 10:47:39-07:00 (the modify date), some program (let's just say it's Acrobat to satisfy my example from before) modified the PDF. The XMP Core version shown in the metadata is NOT from CS5.5.

Also from the 'History' metadata, we can tell that the creation date is actually "2012:12:29 11:20:49-06:00." and NOT "2013:02:22 11:21:34-06:00." My guess is that InDesign was keeping track of the saves, but when it came down to exporting the PDF, it tacked on the export date as the "Create Date" (as the last 'History' save of the file is 3 minutes before the alleged "Create Date").

If we really wanted to, we could use another metadata field (the PDF version) to further pinpoint the program used. If the PDF version is 1.7, we could look for programs on a suspect computer that save PDFs to version 1.7 by default. Believe it or not, many programs still save PDFs as version 1.4, 1.5, and 1.6.

After all of this, I think it's safe to say that PDF metadata can be pretty valuable. You just need to know what's available to you and how to interpret it.

(Thanks to David Cowen for posing the question and offering up the G-C Partners Multi-Boot USB 3.0 flash drive prize!)

-Dan Pullega (@4n6k)