Sunday, July 13, 2014

.NET Method Internals - Common Intermediate Language (CIL) Basics

For those who have had the privilege of reverse engineering heavily obfuscated .NET code, you've probably encountered cases where your decompiler of choice completely fails (or even crashes in an epic fashion) upon attempting to decompile certain methods. Decompilation failure is often one of the intended goals of .NET obfuscator developers. Fortunately, all of the decompiler utilities are also disassemblers and it is exceedingly rare that your tool of choice will fail to disassemble an unruly method. In the cases where you're forced to work with a disassembled method, a basic understanding of .NET bytecode - i.e. Common Intermediate Language (CIL) is required.

Nearly all .NET methods are comprised of an array of CIL instructions and arguments to the instructions. These instructions are all thoroughly documented. CIL instructions manipulate values on what is referred to as the evaluation stack. CIL instructions can push values onto the stack, pop them off the stack, and perform operations on the values at the top off the stack. Let's see this in action. For this example, we're going to analyze a .NET method that implements a bitwise right circular shift - System.Security.Cryptography.SHA256Managed.RotateRight(uint x, int n).

Here is the decompiled method as seen in ILSpy:

This method takes two arguments - a uint32 (x) which represents the value to be rotated and an int32 (n) which represents the number of rotations to perform. For those unfamiliar with the bitwise rotate operation, please read about it here. Since there is no .NET rotate right operator, the code above is the logical equivalent.

Now, let's assume for a moment that the decompiler failed to decompile the method. In that case, change C# in the language tab to IL in ILSpy. You'll be presented with the following CIL disassembly listing:

Before delving into the CIL instructions, there are some additional properties described in the disassembly that were hidden to you in the decompiled output:

1) cil managed

This indicates that the method is implemented with CIL instructions and that "the body of the method is not defined, but is produced by the runtime." - ECMA-335

2) RVA 0x152A65

The relative virtual address of the method in the DLL or EXE that implements the method - i.e. location of the method within the DLL or EXE.

3) .maxstack 8

Specifies the maximum number of elements required on the evaluation stack during the execution of the method.
This number is emitted by the compiler and is required by the .NET runtime. Note: this value can be higher than what is actually required. As we will see, this method actually only requires four stack slots.

4) Code size 17 (0x11)

The total size of all CIL instructions and arguments.

What's interesting in the disassembly is the presence of binary and instructions which are not present in the decompiled output. As you will see, and'ing the shift value (n) by 31 (0x1F) compensates for when the shift value is larger than 31 (the size, in bits on a uint32 minus 1). In our example, we will perform the following operation: 8 ROR 33 which is the equivalent of 8 ROR 1 (32 + 1). The binary and operations serve to convert values greater than 31 to their equivalent value that lies between 0 and 31.

Now, lets validate that when executed, 8 ROR 33 and 8 ROR 1 generate the same result - 4. Since RotateRight is a Nonpublic (i.e. private) method, we'll need to use reflection to invoke it In PowerShell.

$SHA256Managed = [IntPtr].Assembly.GetType('System.Security.Cryptography.SHA256Managed')
$BindingFlags = [Reflection.BindingFlags] 'NonPublic, Static'
$ROR = $SHA256Managed.GetMethod('RotateRight', $BindingFlags)
$ROR.Invoke($null, @([UInt32] 8, 1))
$ROR.Invoke($null, @([UInt32] 8, 33))

They do indeed result in the same value, as expected.

Let's step through each CIL instruction, observing the effect of each instruction on the evaluation stack.

For more information on CIL and .NET internals, I highly recommend you check out the following:
  1. ECMA C# and Common Language Infrastructure Standards
  2. OpCodes Class
  3. The Common Language Infrastructure Annotated Standard
  4. Metaprogramming in .NET