4n6k: 2014

Tuesday, August 26, 2014

Forensic FOSS: 4n6k_volatility_installer.sh - Install Volatility For Linux Automatically

Introducing FORENSIC FOSS! These posts will consist of open source software for use in everyday forensic investigations.

----------------------------------
[UPDATE #01 11/12/2015]: Volatility 2.5 was released recently. A standalone Linux executable is included with the 2.5 release. This installer is for Volatility 2.4. If you want to work with source code and get an idea of the dependencies needed by Volatility, review this installer's code. Otherwise, I'd recommend using the official Linux standalone executable. If you really want a working 2.5 installer update, see the fork of this project by @wzod).
----------------------------------

What Is It?
4n6k_volatility_installer.sh is a bash script that installs Volatility 2.4 (and all dependencies) for Ubuntu Linux with one command.

Why Do I Need It?
Compiling source on Linux can be a pain. Dependency hunting wastes time and drives people away from considering Linux builds of cross-platform software. With this script, you can (1) skip all of the dependency frustration, (2) get right into using the newest version of Volatility, and (3) leverage the arguably more powerful and versatile *nix shell. No longer do you have to worry about whether or not you "have everything."

What's Required?
An internet connection and an APT-based Linux distribution [for the time being]. This script has been tested on stock Ubuntu 12.04 and Ubuntu 14.04. Some testing has been done to support SIFT Workstation 3.0.

What Does It Do?
Specifically, 4n6k_volatility_installer.sh does the following:

Downloads, verifies, extracts, and installs source archives for everything you will need to complete a full installation of Volatility 2.4:

Volatility 2.4
diStorm3
Yara (+ magic module) + Yara-Python
PyCrypto
Python Imaging Library + Library Fixes
OpenPyxl
IPython
pytz

Adds "vol.py" to your system PATH so that you can run Volatility from any location.
Checks to see if you are using SIFT 3.0 and applies some fixes.

How Do I Use It?
Volatility will be installed to the directory you specify.

From a terminal, run:

sudo bash 4n6k_volatility_installer.sh /home/dan

In the above example, the following directories will be created:

/home/dan/volatility_setup

Contains dependency source code and the install_log.txt file.

/home/dan/volatility_2.4

Contains the Volatility 2.4 install.

You must enter a full path for your install directory.

Where Can I Download It?
You can find the full script at the bottom of this page.

Bottom Line?
Don't be afraid of the terminal. Read the source for this script and understand how it works. Automation is acceptable only after you understand what is happening behind the scenes.

I'm open to making this script better. If you see a problem with the code or can suggest improvements, let me know and I'll see what I can do.

Thanks to the Volatility team for writing an amazing tool. Go to http://www.volatilityfoundation.org for more info. Thanks to @The_IMOL, Joachim Metz, @dunit50, and @iMHLv2 for feedback.

#!/bin/bash

# 4n6k_volatility_installer.sh
# v1.1.0 (8/28/2014)
# Installs Volatility for Ubuntu Linux with one command.
# Run this script from the directory in which you'd like to install Volatility.
# Tested on stock Ubuntu 12.04 + 14.04 + SIFT 3
# More at http://www.4n6k.com + http://www.volatilityfoundation.org

# Copyright (C) 2014 4n6k (4n6k.dan@gmail.com)
# 
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

# Define constants
PROGNAME="${0}"
INSTALL_DIR="${1}"
SETUP_DIR="${INSTALL_DIR}"/"volatility_setup"
LOGFILE="${SETUP_DIR}"/"install_vol.log"
ARCHIVES=('distorm3.zip' 'pycrypto-2.6.1.tar.gz' 'ipython-2.1.0.tar.gz' \
          '2.0.5.tar.gz' 'setuptools-5.7.tar.gz' 'Imaging-1.1.7.tar.gz' \
          'v3.1.0.tar.gz' 'volatility-2.4.tar.gz'                       )
HASHES=('2cd594169fc96b4442056b7494c09153' '55a61a054aa66812daf5161a0d5d7eda' \
        '785c7b6364c6a0dd34aa4ea970cf83b9' '05df2ec474a40afd5f84dff94392e36f' \
        '81f980854a239d60d074d6ba052e21ed' 'fc14a54e1ce02a0225be8854bfba478e' \
        '1d4bb952a4f72cd985a2e59e5306f277' '4f9ad730fb2174c90182cc29cb249d20' )

# Program usage dialog
usage() {
  echo -e "\nHere is an example of how you should run this script:"
  echo -e "  > sudo bash ${PROGNAME} /home/4n6k"
  echo -e "Result: Volatility will be installed to /home/4n6k/volatility_2.4"
  echo -e "***NOTE*** Be sure to use a FULL PATH for the install directory.\n"
}

# Usage check; determine if usage should be printed
chk_usage() {
  if [[ "${INSTALL_DIR}" =~ ^(((-{1,2})([Hh]$|[Hh][Ee][Ll][Pp]$))|$) ]]; then
    usage ; exit 1
  elif ! [[ "${INSTALL_DIR}" =~ ^/.*+$ ]]; then
    usage ; exit 1
  else
    :
  fi
}

# Status header for script progress
status() {
  echo ""
  phantom "===================================================================="
  phantom "#  ${*}"
  phantom "===================================================================="
  echo ""
}

# Setup for initial installation environment
setup() {
  if [[ -d "${SETUP_DIR}" ]]; then
    echo "" ; touch "${LOGFILE}"
    phantom "Setup directory already exists. Skipping..."
  else
    mkdir -p "${SETUP_DIR}" ; touch "${LOGFILE}"
    echo "/usr/local/lib" >> /etc/ld.so.conf.d/volatility.conf
  fi
  cd "${SETUP_DIR}"
}

# Download Volatility and its dependencies
download() {
  if [[ -a "${ARCHIVES[7]}" && $(md5sum "${ARCHIVES[7]}" | cut -d' ' -f1) \
    == "${HASHES[7]}" ]]; then
      phantom "Files already downloaded. Skipping..."
  else
    phantom "This will take a while. Tailing install_vol.log for progress..."
    tail_log
    wget -o "${LOGFILE}" \
      "https://distorm.googlecode.com/files/distorm3.zip" \
      "https://ftp.dlitz.net/pub/dlitz/crypto/pycrypto/pycrypto-2.6.1.tar.gz" \
      "https://github.com/plusvic/yara/archive/v3.1.0.tar.gz" \
      "http://effbot.org/downloads/Imaging-1.1.7.tar.gz" \
      "https://pypi.python.org/packages/source/s/setuptools/setuptools-5.7.tar.gz" \
      "https://bitbucket.org/openpyxl/openpyxl/get/2.0.5.tar.gz" \
      "https://github.com/ipython/ipython/releases/download/rel-2.1.0/ipython-2.1.0.tar.gz" \
      "http://downloads.volatilityfoundation.org/releases/2.4/volatility-2.4.tar.gz"
    kill_tail
  fi
}

# Verify md5 hashes of the downloaded archives
verify() {
  local index=0
  for hard_md5 in "${HASHES[@]}"; do
    local archive ; archive="${ARCHIVES[$index]}"
    local archive_md5 ; archive_md5=$(md5sum "${archive}" | cut -d' ' -f1)
    if [[ "$hard_md5" == "$archive_md5" ]]; then
      phantom "= Hash MATCH for ${archive}."
      let "index++"
    else
      phantom "= Hash MISMATCH for ${archive}. Exiting..."
      exit 0
    fi
  done
}

# Extract the downloaded archives
extract() {
  for archive in "${ARCHIVES[@]}"; do
    local ext ; ext=$(echo "${archive}" | sed 's|.*\.||')
    if [[ "${ext}" =~ ^(tgz|gz)$ ]]; then
      tar -xvf "${archive}"
    elif [[ "${ext}" == "zip" ]]; then
      unzip "${archive}"
    else
      :
    fi
  done
} >>"${LOGFILE}"

# Install Volatility and its dependencies
install() {
  # Python
    aptget_install
  # distorm3
    cd distorm3 && py_install
  # pycrypto
    cd pycrypto-2.6.1 && py_install
  # yara + yara-python
    cd yara-3.1.0 && chmod +x bootstrap.sh && ./bootstrap.sh && \
      ./configure --enable-magic ; make ; make install
    cd yara-python && py_install && ldconfig && cd "${SETUP_DIR}"
  # OpenPyxl
    cd setuptools-5.7 && python ez_setup.py && cd "${SETUP_DIR}"
    cd openpyxl-openpyxl-2ed17dbd3445 && py_install
  # Python Imaging Library
    ln -s -f /lib/$(uname -i)-linux-gnu/libz.so.1 /usr/lib/
    ln -s -f /usr/lib/$(uname -i)-linux-gnu/libfreetype.so.6 /usr/lib/
    ln -s -f /usr/lib/$(uname -i)-linux-gnu/libjpeg.so.8 /usr/lib/
  # pytz
    easy_install --upgrade pytz
  # iPython
    cd ipython-2.1.0 && py_install
  # SIFT 3.0 check + fix
    sift_fix
  # Volatility
    mv -f volatility-2.4 .. ; cd ../volatility-2.4 && chmod +x vol.py
    ln -f -s "${PWD}"/vol.py /usr/local/bin/vol.py
    kill_tail
} &>>"${LOGFILE}"

# Shorthand for make/install routine
make_install() {
  ./configure; make; make install; cd ..
}

# Shorthand for build/install Python routine
py_install() {
  python setup.py build install; cd ..
}

# Log script progress graphically
tail_log() {
  if [[ -d /usr/bin/X11 ]]; then
    xterm -e "tail -F ${LOGFILE} | sed "/kill_tail/q" && pkill -P $$ tail;" &
  else
  phantom "No GUI detected. Still running; not showing progress..."
  fi
}

# Kill the graphical script progress window
kill_tail() {
  echo -e "kill_tail" >> "${LOGFILE}"
}

# Install required packages from APT
aptget_install() {
  apt-get update
  apt-get install \
    build-essential libreadline-gplv2-dev libjpeg8-dev zlib1g zlib1g-dev \
    libgdbm-dev libc6-dev libbz2-dev libfreetype6-dev libtool automake \
    python-dev libjansson-dev libmagic-dev -y --force-yes
}

# Shorthand for done message
done_msg() {
  phantom "Done."
}

# Check for SIFT 3.0 and fix
sift_fix() {
  if [[ -d /usr/share/sift ]]; then
    apt-get install libxml2 libxml2-dev libxslt1.1 libxslt1-dev -y --force-yes
    pip install lxml --upgrade
  else
    :
  fi
}

# Text echo enhancement
phantom() {
  msg="${1}"
    if [[ "${msg}" =~ ^=.*+$ ]]; then 
      speed=".01" 
    else 
      speed=".03"
    fi
  let lnmsg=$(expr length "${msg}")-1
  for (( i=0; i <= "${lnmsg}"; i++ )); do
    echo -n "${msg:$i:1}" | tee -a "${LOGFILE}"
    sleep "${speed}"
  done ; echo ""
}

# Main program execution flow
main() {
  chk_usage
  setup
  status "Downloading Volatility 2.4 and dependency source code..."
    download && done_msg
  status "Verifying archive hash values..."
    verify && done_msg
  status "Extracting archives..."
    extract && done_msg
  status "Installing Volatility and dependencies..."
    phantom "This will take a while. Tailing install_vol.log for progress..."
      tail_log
      install ; done_msg
  status "Finished. You can now run "vol.py" from anywhere."
  phantom "Volatility location: ${PWD}"
  phantom "Dependency location: ${SETUP_DIR}"
  echo ""
}

main "$@"

Wednesday, April 16, 2014

Forensics Quickie: Merging VMDKs & Delta/Snapshot Files (2 Solutions)

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

Scenario
I had a VM that was suspended. I needed to see the most recent version of the filesystem. Upon mounting the base .vmdk file, I was presented with the filesystem that existed before the snapshot was taken.

Solution #1
(turns out I ran into a similar problem before...see my post on Mounting Split VMDKs).

The issue lies within the fact that when you create a VM snapshot, "all writes to the original -flat.vmdk are halted and it becomes read-only; changes to the virtual disk are then written to these -delta files instead"[2]. So basically, when I was mounting the base .vmdk file (the "someVM-flat.vmdk"), I wasn't seeing anything that was written to the disk after the snapshot was created. I needed a way to merge the delta files into the -flat file.

To further explain what I was working with, I had three .vmdk files:

Win7.vmdk (1KB in size; disk descriptor file; simply pointed to the -flat.vmdk)
Win7-flat.vmdk (large file size; the base .vmdk file)
Win7-000001.vmdk (delta/snapshot file; every write after the snapshot is stored here)

*Note that this image was not split. If it was, I would just use the method detailed in my other post.

As mentioned, I needed to merge these all into one to be able to mount the .vmdk and see the most recent version of the filesystem. VMware includes a CLI tool for this. It is stored in the "Program Files\VMware" folder. Run this command.

vmware-vdiskmanager.exe –r Win7-000001.vmdk –t 0 singleFileResult.vmdk

Note that the .vmdk file being used as input should be the .vmdk for the latest snapshot. You can confirm which .vmdk file this is by checking the VM's settings.

You can also define what kind of disk you want to output, as well. I have never found it necessary to use anything other than 0.

You can now mount the new .vmdk to see the most recent version of the file system. I *imagine* you could do this for previous snapshots if you define the proper .vmdk. But I have not tested that.

Solution #2
The other solution, which I wound up using after finding out how to do it correctly, is to import the .vmdk files in order within X-Ways. If you try to import a delta file before the base .vmdk, X-Ways will throw an error saying:

"In order to read a differencing VMDK/VHD image, the corresponding parent must be (and stay) opened first. They should be opened in order of their last modified dates - oldest first, skipping none."

So, I did as it said; I imported Win7.vmdk, then Win7-000001.vmdk. It's that easy.

Though this method may become cumbersome with many snapshots/delta files, you would be able to incrementally see what writes had been made to each snapshot. Just be careful when adding delta files for snapshots that depend on previous snapshots (see below).

VMware's Snapshot Manager showing snapshot dependencies.

Thanks to Eric Zimmerman, Jimmy Weg, and Tom Yarrish for helping with the X-Ways method.

-4n6k

References
1. Consolidating snapshots in VMware Fusion
2. Understanding the files that make up a VMware virtual machine

Saturday, March 29, 2014

Forensics Quickie: Verifying Program Behavior Using Source Code

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

Recently, I stumbled upon a question related to the behavior of a given program. That program was Mozilla Firefox and the behavior in question was how profile directory names were generated. Through this post, I will cover how to approach this question and how to solve it with available resources (source code).

The Question
How are Firefox profile directory names generated?

The Answer (and the road to get there)
To answer this question, we first have to understand which artifacts we are examining. In this case, we are dealing with Firefox profiles. Those are located in a user's AppData folder. By navigating AppData, we eventually find C:\Users\someUser\AppData\Roaming\Mozilla\Firefox\Profiles.

The Firefox 'Profiles' folder showing the directory for the profile named "default."

We see a folder for the only Firefox profile used on the system. Reasonably, this account is named "default" by default. But what about that seemingly random prefix?

Let's see if we can glean anything from trying to create a profile in Firefox. First, let's open up the Profile Manager by typing "firefox.exe -p" in the 'Run...' dialog (Windows key + R).

We can confirm that there is only one profile and it is named "default."

When we try to create a profile, we see the following window:

Great. We can actually see where it is going to be saved. And no matter what we enter in the text input field, the prefix stays the same. This tells us that the folder for this new profile isn't generated based on the profile name we enter. But there are other possibilities, such as the folder name being based on the current time.

There are many other tests we could run, but we actually don't need to -- the source code for Firefox is freely available online. Once we download and extract the source code, we can try to find the function that handles the generation of the profile's folder name.

Uncompressed, the Firefox source code is about 585MB. That's a lot of code to review. A better way to sift through all of this data is to either index it all and search it or to just recursively grep the root folder. I decided on the latter.

To find out where to look first, we can try to find a constant, unique phrase around the text in question. In the above image, the string "Your user settings, preferences and other user-related data will be stored in:" is right before the path name with which we are concerned. So let's grep for that and see if we can find anything.

There are many ways to grep data, but this was a quick and dirty job where I wasn't doing any parsing or modifying, so I used AstroGrep. I went ahead and searched for the a watered down version of the aforementioned unique phrase: "will be stored in." The results showed that the file named CreateProfileWizard.dtd contained this unique phrase (there were many files that were responsive, but based on the phrase's context and the filename for the file in which the phrase was found, we can determine relevancy).

A snippet of the responsive "CreateProfileWizard.dtd" file containing our unique phrase.

Now, it's just a matter of tracing this phrase back to its source. So we grab another unique phrase that is relevant to our discovery, such as "profileDirectoryExplanation," and see if we can find any more relevant files. Grepping for it comes back with more results -- one of which is createProfileWizard.xul. I didn't see much in this file at first glance (though there is quite bit of good info such as "<wizardpage id="createProfile" onpageshow="initSecondWizardPage();">" -- which will be seen soon), so I decided to see what else was in same directory as this file. There, I found createProfileWizard.js.

A snippet of the "createProfileWizard.js" file showing the functions used to generate the profile directory names.

Skimming the code, we can see that a salt table and Math.random() are used to determine the 8 character prefix. At line 68, we see the concatination of kSaltString ('0n38y60t' in the animated gif), a period ('.'), and aName (our text input field variable -- in our example, "abcdefg" or "4n6k").

In the end, the point is that having source code available makes research more streamlined. By looking at the Firefox source, we were able to determine that the profile directory names aren't based on a telling artifact.

Using what is available to you is invaluable. By having a program's source code [and approaching an issue with an analytic mindset], verifying program behavior is made so much easier.

I'll end with this: open source development is GOOD -- especially for forensics. Whether it's a tool meant to output forensic artifacts or just another Firefox, we are able to more confidently prove why we are seeing what we're seeing.

-4n6k

* Special thanks to Tom Yarrish (@CdtDelta) for spawning this post.

Monday, February 17, 2014

Forensics Quickie: PDF Metadata Forensics (Sunday Funday Answer)

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

David Cowen's Sunday Funday contests are great. In short, a question is posed about a subject, and readers get the opportunity to answer and explain how it is relevant in a forensic investigation. Much can be learned from these contests, as a great answer is almost always delivered and posted publicly. I highly recommend checking it out and submitting answers (it's a weekly thing, so the subjects vary widely).

This week's (2/16/14) subject was PDF Metadata. I took a stab at answering and wound up winning, so I figured it wouldn't hurt to post my answer here. The original answer, along with a preface by David Cowen, can be found here.

The Questions
The subject's prompt was as follows:

1. What metadata can be present within a PDF document?
2. What affects the types of metadata that will be present within a PDF document?

The Answer
The following is my answer to the prompt. Disclaimer: this was written in the wee-hours of one night, so if there are any typos or errors, please feel free to contact me.

----
Much can be gleaned from PDF metadata. Of course, there are the standard fields that will provide you with relatively common metadata...but depending on the program you use to create the PDF, there could be much, much more.

Let's start with the most common metadata. Most PDFs will have embedded metadata showing the PDF version, creation date, creation program, document ID, and last modified date. There are definitely more, but I have left them out as they wouldn't be very useful in an investigation (e.g. page count). The use for the aforementioned metadata is fairly obvious, but I will explain nonetheless.

The more obvious PDF metadata entries are:

Creation program: program used to create the PDF (was it through desktop software, was it scanned, etc.)
Creator/Author/Producer: Username or full name of the PDF's author OR further details on the program used to create the PDF (is it a previous employer?)
Title: the title of the PDF that usually provides an outdated name for the document; good for identifying previous employer documents or documents that have been converted from one format into a PDF (e.g. SecretBluePrint.eps or oldCompanyFinances.doc shows up in the 'title' metadata entry)

Those are the easy ones. But what about the more overlooked metadata? As I mentioned before, the program used to create or modify the PDF may have a huge impact on what information you are given. With that, let's look into it.

First, timestamps. We know that file metadata could potentially serve as a better indicator of when a given document was created. If the PDF has been transferred across various volumes and systems -- and we would like to find the origin of the document -- the creation date in the file metadata is going to be more reliable than the file system creation date (as the filesystem date/time will have been updated with the copies/moves).

A More Reliable Creation Date

The metadata 'creation' date will [usually] preserve the REAL date of the file's creation. That is, if the PDF has been transferred across various volumes and systems, the 'creation' date in the file's metadata is going to give us a better idea of when the document was initially created.

The 'modified' date can be used in a similar way. We might even be able to tell how many programs through which the PDF was modified/saved. Say we have a PDF created using Adobe InDesign. If we were to open this PDF, modify it, and then save it as a new file using 'Save As...' in a program like Adobe Acrobat, we would see that the 'creation' date is still unchanged, but the 'modified' date had been updated (file system creation dates will tell us differently). Pretty standard stuff. Even if the PDF is saved using 'Save As...' (essentially creating a new file altogether with an updated file system creation time) AND it is moved from one system to the next, we will still have a genuine 'creation' date. Not only that, but we will have a metadata 'modified' date AND a new file system creation time to work with. Correlation among file metadata and file system timestamps is beyond the scope of this answer, but you get the point; 'creation' and 'modified' metadata dates are powerful and can be used creatively.

Also, with many PDF timestamps, we will be able to see a timezone offset. For example, a creation timestamp could be 2013:02:22 11:21:34-06:00. We now have a potential indication that the program that produced this PDF was set in Mountain time.

I mentioned that we might be able to determine if a PDF was created and modified through more than one program. As a quick side note, and if we really wanted to dive into the PDF analysis, we could take a look at some of the other telling metadata. The example above suggested creating a PDF in InDesign, opening it up in Acrobat, modifying it, and then saving it as a new file. When this happens, some of the metadata in the new file (like the 'modified' time) is updated while all of the InDesign metadata stays intact. However, there is a significant difference this time around: the 'XMP Toolkit' metadata value is different. Adobe implements their XMP Toolkit in all of their applications and plugins. They even open sourced it, so other programs can use it (and many do). The point is, "the XMPCore component of this toolkit is what allows for the creation and modification of the metadata that follows the XMP Data Model" (more here and here). So we have two PDFs, but the metadata for each was manipulated by two different versions of Adobe's XMP Core.

InDesign used:

"Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27" and...

Acrobat used:
"Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03"

But why is this important? Well, we can now more accurately pinpoint the program used to create the PDF. Sure, we will likely already have a metadata entry that tells us the 'Creation Program,' but consider the above example; that tool (InDesign) may have been used to initially create the PDF, but it was NOT used to open, modify, and save a new version of it (Acrobat did that). Let's keep this in mind as we explain some other interesting metadata...

Remember: the amount of metadata that a program uses when creating files is limitless. XMP is built on XML, so any metadata tags can be defined. Let's take a real-world example of how powerful PDF metadata can be when created from certain programs. Download Trustwave's Global Security Report PDF from 2013. Run it in exiftool. What do you see? That's right, the "History" metadata fields will show you not only that the document was saved 497 times, but it will also show you the exact times that is was saved, the program used to save it each time, and the Document Instance ID for each save (less exciting).

While you have that open, take a look at the creation date (2013:02:22 11:21:34-06:00) and modify date (2013:05:09 10:47:39-07:00). The modify date is much later, but the last "History" save on the file was 2013:02:22 11:18:06-06:00. What's up with that? This is because the PDF was modified in a different program; one newer than InDesign CS5.5. How do I know this? Well, look at the XMP Core version. The XMP Core version used for InDesign CS5.5 is "Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03." I just so happen to have a PDF created with InDesign CS6 and that PDF uses "Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27." How can it be that CS5.5 is using a later XMP Core version than CS6?! Because another program was used to modify the CS5.5 PDF after the last save. On 2013:05:09 10:47:39-07:00 (the modify date), some program (let's just say it's Acrobat to satisfy my example from before) modified the PDF. The XMP Core version shown in the metadata is NOT from CS5.5.

Also from the 'History' metadata, we can tell that the creation date is actually "2012:12:29 11:20:49-06:00." and NOT "2013:02:22 11:21:34-06:00." My guess is that InDesign was keeping track of the saves, but when it came down to exporting the PDF, it tacked on the export date as the "Create Date" (as the last 'History' save of the file is 3 minutes before the alleged "Create Date").

If we really wanted to, we could use another metadata field (the PDF version) to further pinpoint the program used. If the PDF version is 1.7, we could look for programs on a suspect computer that save PDFs to version 1.7 by default. Believe it or not, many programs still save PDFs as version 1.4, 1.5, and 1.6.

After all of this, I think it's safe to say that PDF metadata can be pretty valuable. You just need to know what's available to you and how to interpret it.
----

-4n6k

Thursday, February 6, 2014

Forensics Quickie: Pinpointing Recent File Activity - RecentDocs

FORENSICS QUICKIES! These posts will consist of small tidbits of useful information that can be explained very succinctly.

[UPDATE #02 06/09/2016]: Juan Aranda (@jharanda1) wrote a more portable Python script to automate the process detailed in this post. It uses Willi Ballenthin's (@williballenthin) python-registry. The script by Eric (below) used a module that could only be run on Windows (and left some clutter when run against an NTUSER hive). Being able to run Juan's script on OS X, Windows, and Linux is helpful. I really like the output format and that it doesn't leave any extra clutter after parsing, so this is my go-to script for this kind of analysis now. You can find more info about the script at Juan's blog (RaptIR) here.

Note that Phill Moore (@phillmoore) also has a RegRipper plugin built for this technique, as well. There are many options out there, so test which one you like best. You can find Phill's plugin here.

[UPDATE #01 05/01/2015]: A Python script has been created to automate the process detailed in this post. You can download Eric Opdyke's (@EricOpdyke) script here.

Let's revisit the basics...

Scenario
A suspicious employee left your company on January 28, 2014. You'd like to know which files were most recently used (opened, saved) on the employee's system right before he/she left.

The Solution
Pull the user's NTUSER.DAT. Run RegRipper to easily output the the values within the Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs subkey.

To give some background, the RecentDocs subkey will contain a few things:

Subkeys named after every file extension used on the system (e.g. a subkey for .zip, .doc, .mp4, etc.).

Each of these subkeys will contain its own MRUListEx value that keeps track of the order in which files were opened. (Sound familiar? We also saw use of the MRUListEx value upon analyzing shellbags artifacts).
Each subkey will store up to 10 numbered values; each numbered value represents a recently opened file with the extension found in the subkey's name.

The .gif subkey showing the 10 most recently opened .gif files for a specific user. The order is stored in MRUListEx.

A subkey for folders most recently opened.

Can store up to 30 numbered values (i.e. the 30 most recently opened folders).

A root MRUListEx value

Stores the 149 most recently opened documents and folders across all file extensions.

With that, take note that MRU times are powerful; here's why:

Since every extension has its own subkey, we will have quite a few registry key LastWrite times with which to work. But so what? When dealing with RecentDocs, the LastWrite time only applies to the most recently opened file within that subkey. While this is certainly the case, we mustn't forget that we also have an all-encompassing MRUListEx in the root of the RecentDocs subkey.

So what does this mean? By using the LastWrite times of each subkey and the root MRUListEx list, we can more accurately determine when certain files were last opened. While we won't be able to get an EXACT time for non-MRU (i.e. files that are not the first entry in the MRUListEx value), we may be able to determine a time range in which these files were opened. Consider the following example:

Our root MRUListEx shows:

RecentDocs
**All values printed in MRUList\MRUListEx order.
Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs
LastWrite Time Thu Feb 6 14:08:49 2014 (UTC)

26 = smiley.gif
99 = strange_bird.jpg
42 = books_to_read.csv
130 = secret_clients.csv
55 = zip_passwords.rtf
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt
...

At this point, if we look at the first few entries of the root MRUListEx, we can really only tell that smiley.gif was last opened on Feb 6, 2014. So let's mark it:

  26 = smiley.gif ............................Feb 6 14:08:49 2014
99 = strange_bird.jpg
42 = books_to_read.csv
130 = secret_clients.csv
55 = zip_passwords.rtf
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt
...

That secret_clients.csv looks pretty interesting. Let's look at the .csv subkey to see if we can find out when it was opened.

The .csv subkey's MRUListEx shows:

Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs\.csv
LastWrite Time Sat Jan 18 19:19:22 2014 (UTC)
MRUListEx = 6,2,1,7,3,5,4

6 = books_to_read.csv
2 = secret_clients.csv
1 = my_diet.csv
7 = contacts.csv
3 = phillip.csv
...

Dang. Looks like the books_to_read.csv file is the most recently opened .csv, which means that the Jan 18 19:19:22 2014 LastWrite time of the .csv subkey is going to represent the last time that that file was opened -- not secret_clients.csv. I guess we can give up on this one...

...or...we could look at the surrounding files to glean some more information. First, let's mark the known LastWrite time in our root MRUListEx list.

  26 = smiley.gif ............................Feb 6 14:08:49 2014
99 = strange_bird.jpg
42 = books_to_read.csv...............Jan 18 19:19:22 2014
130 = secret_clients.csv
55 = zip_passwords.rtf
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt
...

Looking at our list, we can see that there are two open times that we know for sure. We can also see that there is another file that was opened between these times. It looks like strange_bird.jpg was opened after smiley.gif, but before books_to_read.csv. Since this list is an all-encompassing list, it would be safe to say that between January 18 and Feb 6, strange_bird.jpg was opened. But let's go ahead and confirm that.

The .jpg subkey's MRUListEx shows:

Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs\.jpg
LastWrite Time Sat Jan 18 21:54:57 2014 (UTC)
MRUListEx = 4,2,3,1

4 = strange_bird.jpg
2 = stone_tanooki.jpg
3 = camel.jpg
1 = fighting_frogs.jpg

It looks like strange_bird.jpg is the most recently opened .jpg on the system. That means that the Jan 18 21:54:57 2014 LastWrite time of the .jpg subkey represents the the time at which strange_bird.jpg was opened. Again, let's mark it in our root MRUListEx list:

  26 = smiley.gif ............................Feb 6 14:08:49 2014
99 = strange_bird.jpg...................Jan 18 21:54:57 2014
42 = books_to_read.csv...............Jan 18 19:19:22 2014
130 = secret_clients.csv
55 = zip_passwords.rtf
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt
...

At this point, we still don't know when secret_clients.csv was last opened. Upon finding its LNK file, we note that it had been first opened in late 2013 (LNK creation time). If this machine was running XP, we would be able to grab the access time of this LNK file to see when it *might* have been last opened. Alas, we are using Windows 7 (disclaimer: there are tons of other things you should be looking at anyway; this is merely one example of why we would want to look at RecentDocs). Let's keep moving.

Taking a second look at our root MRUListEx list, we notice that the next item after secret_clients.csv is zip_password.rtf. That also looks interesting, so let's see when that was opened.

The .rtf subkey's MRUListEx shows:

Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs\.rtf
LastWrite Time Sat Jan 17 14:15:31 2014 (UTC)
MRUListEx = 2,1

2 = zip_passwords.rtf
1 = fiction_novel.rtf

It looks to be the first one, so we can use the LastWrite time to mark it in our list.

  26 = smiley.gif ............................Feb 6 14:08:49 2014
99 = strange_bird.jpg...................Jan 18 21:54:57 2014
42 = books_to_read.csv...............Jan 18 19:19:22 2014
130 = secret_clients.csv
55 = zip_passwords.rtf.................Jan 17 14:15:31 2014
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt
...

With that, we have found a pretty tight time frame as to when secret_clients.csv was last opened. We don't have the exact date and time, but we can now say that is was last opened somewhere between Jan 17 and Jan 18.

It looks as though there are some more .csv files that were recently opened, and by the looks of it, that .txt file might be able to help us hone in on a time frame.

The .txt subkey's MRUListEx shows:

Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs\.txt
LastWrite Time Sat Jan 10 18:23:08 2014 (UTC)
MRUListEx = 4,5,2,1,3

4 = notes.txt
5 = groceries.txt
2 = how_to_make_toast.txt
1 = my_novel.txt
3 = address.txt
...

Just as before, let's mark the LastWrite time in our root MRUListEx list.

  26 = smiley.gif ............................Feb 6 14:08:49 2014
99 = strange_bird.jpg...................Jan 18 21:54:57 2014
42 = books_to_read.csv...............Jan 18 19:19:22 2014
130 = secret_clients.csv
55 = zip_passwords.rtf.................Jan 17 14:15:31 2014
87 = my_diet.csv
3 = contacts.csv
74 = phillip.csv
72 = notes.txt...............................Jan 10 18:23:08 2014
...

Again, we don't have an exact date and time as to when the my_diet.csv, contacts.csv, and phillip.csv files were last opened. But, based on the LastWrite time of the .txt subkey, we do know the time frame in which these files were opened.

Now imagine that a USB device was also plugged in at some point within this time frame. It is possible that the suspicious employee may have done a "Save As..." on contacts.csv and saved it to the external device. It is also possible that the employee opened the file and copy/pasted the information into an email. Of course, the possibilities are endless, but you get the point.

And it goes without saying that other artifacts, such as LNK files, jump lists, ComDlg32, OfficeDocs, and more should also be used to complement RecentDocs analysis. In particular, ComDlg32 would be of interest in this scenario, as it will show you files recently opened/saved. It offers more information (e.g. the executable used to open the file, the file's path, etc.) and stores saves/opens by extension, as well. As such, we can use this same technique to pinpoint file open/save times (though, it is a bit limited, as the * extension subkey will only show up to 20 entries).

This post quickly became longer than expected, but it's still a quickie. I'll end with this:

This is nothing new. All of this data is available to you in the Windows registry. This is a fairly basic technique, but it is also one that shouldn't be overlooked. And as always, use more than one artifact to make your case.

-4n6k

Pages