x86 Disassembly/Branches

x86 Disassembly

Branching

Computer science professors tell their students to avoid jumps and goto instructions, to avoid the proverbial "spaghetti code." Unfortunately, assembly only has jump instructions to control program flow. This chapter will explore the subject that many people avoid like the plague, and will attempt to show how the spaghetti of assembly can be translated into the more familiar control structures of high-level language. Specifically, this chapter will focus on If-Then-Else and Switch branching instructions.

If-Then

Let's consider a generic if statement in pseudo-code followed by its equivalent form using jumps:

if (condition) then
 do_action;

if not (condition) then goto end;
  do_action;
end:

What does this code do? In English, the code checks the condition and performs a jump only if it is false. With that in mind, let's compare some actual C code and its Assembly translation:

 if(x == 0)
 {
   x = 1;
 }
 x++;

 mov eax, $x
 cmp eax, 0
 jne end
 mov eax, 1
 end:
 inc eax
 mov $x, eax

Note that when we translate to assembly, we need to negate the condition of the jump because--like we said above--we only jump if the condition is false. To recreate the high-level code, simply negate the condition once again.

Negating a comparison may be tricky if you're not paying attention. Here are the correct dual forms:

Instruction	Meaning
JNE	Jump if not equal
JE	Jump if equal
JG	Jump if greater
JLE	Jump if less than or equal
JL	Jump if less than
JGE	Jump if greater or equal

And here are some examples.

 mov eax, $x                     //move x into eax
 cmp eax, $y                     //compare eax with y
 jg end                          //jump if greater than
 inc eax
 mov $x, eax                    //increment x
 end:
 ...

Is produced by these C statements:

 if(x <= y)
 {
    x++;
 }

As you can see, x is incremented only if it is less than or equal to y. Thus, if it is greater than y, it will not be incremented as in the assembler code. Similarly, the C code

 if(x < y)
 {
    x++;
 }

produces this assembler code:

 mov eax, $x                        //move x into eax
 cmp eax, $y                        //compare eax with y
 jge end                            //jump if greater than or equal to
 inc eax
 mov $x, eax                       //increment x
 end:
 ...

X is incremented in the C code only if it is less than y, so the assembler code now jumps if it's greater than or equal to y. This kind of thing takes practice, so we will try to include lots of examples in this section.

If-Then-Else

Let us now look at a more complicated case: the If-Then-Else instruction.

if (condition) then
  do_action
else
  do_alternative_action;

if not (condition) goto else;
  do_action;
  goto end;
else:
  do_alternative_action;
end:

Now, what happens here? Like before, the if statement only jumps to the else clause when the condition is false. However, we must also install an unconditional jump at the end of the "then" clause, so we don't perform the else clause directly afterwards.

Now, here is an example of a real C If-Then-Else:

 if(x == 10)
 {
    x = 0;
 }
 else
 {
    x++;
 }

Which gets translated into the following assembly code:

 mov eax, $x
 cmp eax, 0x0A ;0x0A = 10
 jne else
 mov eax, 0
 jmp end
 else:
 inc eax
 end:
 mov $x, eax

As you can see, the addition of a single unconditional jump can add an entire extra option to our conditional.

Switch-Case

Switch-Case structures can be very complicated when viewed in assembly language, so we will examine a few examples. First, keep in mind that in C, there are several keywords that are commonly used in a switch statement. Here is a recap:

Switch: This keyword tests the argument, and starts the switch structure
Case: This creates a label that execution will switch to, depending on the value of the argument.
Break: This statement jumps to the end of the switch block
Default: This is the label that execution jumps to if and only if it doesn't match up to any other conditions

Lets say we have a general switch statement, but with an extra label at the end, as such:

 switch (x)
 {
 //body of switch statement
 }
 end_of_switch:

Now, every break statement will be immediately replaced with the statement

 jmp end_of_switch

But what do the rest of the statements get changed to? The case statements can each resolve to any number of arbitrary integer values. How do we test for that? The answer is that we use a "Switch Table". Here is a simple, C example:

 int main(int argc, char **argv)
 { //line 10
 	switch(argc)
 	{
 		case 1:
 			MyFunction(1);
 			break;
 		case 2:
 			MyFunction(2);
 			break;
 		case 3:
 			MyFunction(3);
 			break;
 		case 4:
 			MyFunction(4);
 			break;
 		default:
 			MyFunction(5);
 	}
 	return 0;
 }

And when we compile this with cl.exe, we can generate the following listing file:

 tv64 = -4		; size = 4
 _argc$ = 8		; size = 4
 _argv$ = 12		; size = 4
 _main	PROC NEAR
 ; Line 10
 	push	ebp
 	mov	ebp, esp
 	push	ecx
 ; Line 11
 	mov	eax, DWORD PTR _argc$[ebp]
 	mov	DWORD PTR tv64[ebp], eax
 	mov	ecx, DWORD PTR tv64[ebp]
 	sub	ecx, 1
 	mov	DWORD PTR tv64[ebp], ecx
 	cmp	DWORD PTR tv64[ebp], 3
 	ja	SHORT $L810
 	mov	edx, DWORD PTR tv64[ebp]
 	jmp	DWORD PTR $L818[edx*4]
 $L806:
 ; Line 14
 	push	1
 	call	_MyFunction
 	add	esp, 4
 ; Line 15
 	jmp	SHORT $L803
 $L807:
 ; Line 17
 	push	2
 	call	_MyFunction
 	add	esp, 4
 ; Line 18
        jmp     SHORT $L803
 $L808:
 ; Line 19
 	push	3
 	call	_MyFunction
 	add	esp, 4
 ; Line 20
 	jmp	SHORT $L803
 $L809:
 ; Line 22
 	push	4
 	call	_MyFunction
 	add	esp, 4
 ; Line 23
 	jmp	SHORT $L803
 $L810:
 ; Line 25
 	push	5
 	call	_MyFunction
 	add	esp, 4
 $L803:
 ; Line 27
 	xor	eax, eax
 ; Line 28
 	mov	esp, ebp
 	pop	ebp
 	ret	0
 $L818:
 	DD	$L806
 	DD	$L807
 	DD	$L808
 	DD	$L809
 _main	ENDP

Lets work our way through this. First, we see that line 10 sets up our standard stack frame, and it also saves ecx. Why does it save ecx? Scanning through the function, we never see a corresponding "pop ecx" instruction, so it seems that the value is never restored at all. In fact, the compiler isn't saving ecx at all, but is instead simply reserving space on the stack: it's creating a local variable. The original C code didn't have any local variables, however, so perhaps the compiler just needed some extra scratch space to store intermediate values. Why doesn't the compiler execute the more familiar "sub esp, 4" command to create the local variable? push ecx is just a faster instruction that does the same thing. This "scratch space" is being referenced by a negative offset from ebp. tv64 was defined in the beginning of the listing as having the value -4, so every call to "tv64[ebp]" is a call to this scratch space.

There are a few things that we need to notice about the function in general:

Label $L803 is the end_of_switch label. Therefore, every "jmp SHORT $L803" statement is a break. This is verifiable by comparing with the C code line-by-line.
Label $L818 contains a list of hard-coded memory addresses, which here are labels in the code section! Remember, labels resolve to the memory address of the instruction. This must be an important part of our puzzle.

To solve this puzzle, we will take an in-depth look at line 11:

 mov	eax, DWORD PTR _argc$[ebp]
 mov	DWORD PTR tv64[ebp], eax
 mov	ecx, DWORD PTR tv64[ebp]
 sub	ecx, 1
 mov	DWORD PTR tv64[ebp], ecx
 cmp	DWORD PTR tv64[ebp], 3
 ja	SHORT $L810
 mov	edx, DWORD PTR tv64[ebp]
 jmp	DWORD PTR $L818[edx*4]

This sequence performs the following pseudo-C operation:

if( argc - 1 >= 4 )
{
   goto $L810;   /* the default */
}
label *L818[] = { $L806, $L807, $L808, $L809 };  /* define a table of jumps, one per each case */
//
goto L818[argc - 1];   /* use the address from the table to jump to the correct case */

Here's why...

The Setup

 mov	eax, DWORD PTR _argc$[ebp]
 mov	DWORD PTR tv64[ebp], eax
 mov	ecx, DWORD PTR tv64[ebp]
 sub	ecx, 1
 mov	DWORD PTR tv64[ebp], ecx

The value of argc is moved into eax. The value of eax is then immediately moved to the scratch space. The value of the scratch space is then moved into ecx. Sounds like an awfully convoluted way to get the same value into so many different locations, but remember: I turned off the optimizations. The value of ecx is then decremented by 1. Why didn't the compiler use a dec instruction instead? Perhaps the statement is a general statement, that in this case just happens to have an argument of 1. We don't know why exactly, all we know is this:

eax = "scratch pad"
ecx = eax - 1

Finally, the last line moves the new, decremented value of ecx back into the scratch pad. Very inefficient.

The Compare and Jumps

 cmp	DWORD PTR tv64[ebp], 3
 ja	SHORT $L810

The value of the scratch pad is compared with the value 3, and if the unsigned value is above 3 (4 or more), execution jumps to label $L810. How do I know the value is unsigned? I know because ja is an unsigned conditional jump. Let's look back at the original C code switch:

        switch(argc)
 	{
 		case 1:
 			MyFunction(1);
 			break;
 		case 2:
 			MyFunction(2);
 			break;
 		case 3:
 			MyFunction(3);
 			break;
 		case 4:
 			MyFunction(4);
 			break;
 		default:
 			MyFunction(5);
 	}

Remember, the scratch pad contains the value (argc - 1), which means that this condition is only triggered when argc > 4. What happens when argc is greater than 4? The function goes to the default condition. Now, let's look at the next two lines:

 mov	edx, DWORD PTR tv64[ebp]
 jmp	DWORD PTR $L818[edx*4]

edx gets the value of the scratch pad (argc - 1), and then there is a very weird jump that takes place: execution jumps to a location pointed to by the value (edx * 4 + $L818). What is $L818? We will examine that right now.

The Switch Table

 $L818:
  	DD	$L806
  	DD	$L807
 	DD	$L808
 	DD	$L809

$L818 is a pointer, in the code section, to a list of other code section pointers. These pointers are all 32bit values (DD is a DWORD). Let's look back at our jump statement:

 jmp	DWORD PTR $L818[edx*4]

In this jump, $L818 isn't the offset, it's the base, edx*4 is the offset. As we said earlier, edx contains the value of (argc - 1). If argc == 1, we jump to [$L818 + 0] which is $L806. If argc == 2, we jump to [$L818 + 4], which is $L807. Get the picture? A quick look at labels $L806, $L807, $L808, and $L809 shows us exactly what we expect to see: the bodies of the case statements from the original C code, above. Each one of the case statements calls the function "MyFunction", then breaks, and then jumps to the end of the switch block.

Ternary Operator ?:

Again, the best way to learn is by doing. Therefore we will go through a mini example to explain the ternary operator. Consider the following C code program:

 int main(int argc, char **argv)
 {
    return (argc > 1)?(5):(0);
 }

cl.exe produces the following assembly listing file:

 _argc$ = 8						; size = 4
 _argv$ = 12						; size = 4
 _main	PROC NEAR
 ; File c:\documents and settings\andrew\desktop\test2.c
 ; Line 2
 	push	ebp
 	mov	ebp, esp
 ; Line 3
 	xor	eax, eax
 
        cmp	DWORD PTR _argc$[ebp], 1
        setle	al
        dec	eax
        and	eax, 5
 ; Line 4
 	pop	ebp
 	ret	0
 _main	ENDP

Line 2 sets up a stack frame, and line 4 is a standard exit sequence. There are no local variables. It is clear that Line 3 is where we want to look.

The instruction "xor eax, eax" simply sets eax to 0. For more information on that line, see the chapter on unintuitive instructions. The cmp instruction tests the condition of the ternary operator. The setle function is one of a set of x86 functions that works like a conditional move: al gets the value 1 if argc <= 1. Isn't that the exact opposite of what we wanted? In this case, it is. Let's look at what happens when argc = 0: al gets the value 1. al is decremented (al = 0), and then eax is logically anded with 5. 5 & 0 = 0. When argc == 2 (greater than 1), the setle instruction doesn't do anything, and eax still is zero. eax is then decremented, which means that eax == -1. What is -1?

In x86 processors, negative numbers are stored in two's-complement format. For instance, let's look at the following C code:

 BYTE x;
 x = -1;

At the end of this C code, x will have the value 11111111: all ones!

When argc is greater than 1, setle sets al to zero. Decrementing this value sets every bit in eax to a logical 1. Now, when we perform the logical and function we get:

 ...11111111
&...00000101     ;101 is 5 in binary
------------
 ...00000101

eax gets the value 5. In this case, it's a roundabout method of doing it, but as a reverser, this is the stuff you need to worry about.

For reference, here is the GCC assembly output of the same ternary operator from above:

 _main:
 pushl	%ebp
 movl	%esp, %ebp
 subl	$8, %esp
 xorl	%eax, %eax
 andl	$-16, %esp
 call	__alloca
 call	___main
 xorl	%edx, %edx
 cmpl	$2, 8(%ebp)
 setge	%dl
 leal	(%edx,%edx,4), %eax
 leave
 ret

Notice that GCC produces slightly different code than cl.exe produces. However, the stack frame is set up the same way. Notice also that GCC doesn't give us line numbers, or other hints in the code. The ternary operator line occurs after the instruction "call __main". Let's highlight that section here:

 xorl	%edx, %edx
 cmpl	$2, 8(%ebp)
 setge	%dl
 leal	(%edx,%edx,4), %eax

Again, xor is used to set edx to 0 quickly. Argc is tested against 2 (instead of 1), and dl is set if argc is greater than or equal. If dl gets set to 1, the leal instruction directly thereafter will move the value of 5 into eax (because lea (edx,edx,4) means edx + edx * 4, i.e. edx * 5).