Monday, March 25, 2013

Parsing Binary File Formats with PowerShell

I'm giving a presentation on "Parsing Binary File Formats with PowerShell" for MiSec on Tuesday, March 26. For those who will not be attending, the slides and code are available for download.

In the presentation, I cover the following:

1) Why you would want to parse binary data
2) Why PowerShell is a powerful tool to accomplish this task
3) A brief overview of data types and how they differ across languages: C/C++, C#, PowerShell
4) Conversion from C data types to PowerShell/.NET data types
5) All concepts taught are applied by parsing the DOS header of a PE file.
6) DOS header overview
7) The three strategies for parsing binary data in PowerShell:
   a) Pure PowerShell-based approach using only PowerShell cmdlets (no .NET)
   b) C# compilation using the Add-Type cmdlet
   c) Reflection
8) Reading in binary data in PowerShell
9) I cover building a DOS header parser using the three strategies
10) Brief overview of reflection and .NET application layout
11) Applications of a DOS header parser
12) Bonus: Intro to the Rich signature
13) Bonus: I extend the DOS header parser to decode and parse the Rich signature

I also provide the following code:

1) Get-DosHeader_Pure_PowerShell.ps1 - A pure PowerShell-based implementation of the DOS header parser
2) Get-DosHeader_CSharp.ps1 - A DOS header parser using Add-Type to compile C# code
3) Get-DosHeader_Reflection.ps1 - A DOS header parser implemented using reflection
4) Get-DosHeader_Reflection_Bonus.ps1 - Same as #3 but extended to include a Rich signature decoder/parser
5) Get-DosHeader.format.ps1xml - A formatting file used to display a proper hexadecimal representation of the parsed DOS header

While the example I use throughout the presentation is a simple one, you would be surprised what information can be gleaned by performing analysis on known good DOS headers in PE files. For example, after scanning 6695 DOS headers, I found that the following fields were always 0: e_crlc, e_cparhdr, e_minalloc, e_ss, e_csum, e_ip, e_cs, e_ovno, e_oemid, e_oeminfo, e_res2. This simple heuristic alone could be used as a signature to detect a malformed DOS header/PE file. TinyPE is the perfect example. Also, a simple DOS header parser can be used to scan for all PE files on disk. What you'll discover is that there are some non-standard PE file extensions that you may not have been familiar with: .lrc, .ax, .rs, .tlb, .acm, .tsp, .efi, .rll, .ime, .old, .dat, .iec, etc.

The techniques that I describe can easily be used to parse any binary format - from a stupid DOS header parser to a PowerShell implementation of binwalk. The sky is the limit.



  1. thanks - have always liked your posts ... I've done a fair amount of scripting and light traditional programming, but have a bit of a knowledge gap when it comes to low level languages and the actual sequence of events an OS runs through when beginning to execute an executable.

    I've had a hard time finding an answer to this one: Is it possible to run an executable in PS entirely from memory? (eg. bytes read via tcp stream into mem then executed) --- or am i thinking about this the wrong way?

    1. Thanks. I'm glad you're enjoying my posts. :D

      Yes. It is absolutely possible to load and execute an executable (exe or dll) entirely in memory. This technique is called reflective loading. Essentially, you have to implement the loading process yourself since the only supported way in Windows to load an executable is from disk (with the exception of managed dlls -

      Since you asked the question, you may be excited to hear that I'll be adding a reflective loading script to PowerSploit very shortly. It was written by a new contributor to PowerSploit and I've been extremely impressed with his code!

    2. awesome ... your posts are starting to cost me more and more time as I digest & update code with new ideas =) ... looking forward to it

  2. Not creates temporary files in the system or new processes ..? Everything in memory?

    1. Correct. No temp files and no new processes. Everything is in memory.

  3. I share my videos ... i have risen to securitytube( advantage: not creates temporary files in the system does not create new processes ..everything in memory ... i hope you like it you know any technical bypass IDS?

  4. I'm new to powershell but I like the idea of this. Would this same process work for parsing other binary files? I'm looking for a pure powershell script to parse "forensic" related files (Prefetch, Job files, Index.dat, etc.) - Something like this ->

    Thoughts on this?