Reverse Engineering/Stack Overflows

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Frequently we hear about malicious code causing a very vague problem called a stack overflow. This page is going to talk about what a stack overflow is, and how to prevent it.

What It Is[edit | edit source]

A stack-based overflow attack is the act of putting too much information into a buffer in order to overwrite a return address and hijack the control flow. The overwritten return address will, in most cases, point to some function in the programs address space. This function may already be defined in the application, or it can easily be defined by the hacker by injecting the code into the stack.

If we remember the chapter on the stack, we know a few fundamental facts about the stack when we enter into a new function:

  1. The stack "grows" downward.
  2. Local data is pushed on top of the stack.
  3. The old value for bp is stored below the local data
  4. The return address is stored below the old bp value

Consider the following buggy C code snippet:

void MyFunction(void)
{
   int a[100];
   int i;
   for(i = 0; i <= 100; i++)
   {
      a[i] = 0;
   }
   ...

What happens when i reaches 100? As discussed earlier we know that local arrays are created on the stack. If we try to write above the upper bound of "a", we will be overwriting the previous value on the stack: a[100] overwrites bp, a[101] overwrites the return address.

The program flow will then be redirected to the new address we placed. This is a stack overflow vulnerability, and it stems from bad programming where the programmer doesn't check the array bounds before writing data to the array.

Spotting a Vulnerability[edit | edit source]

How do reversers spot a stack overflow vulnerability? Let's take a look at some example ASM code:

push ebp
mov ebp, esp
sub esp, 100

This is a standard entry sequence, and we can see that this function is allocating 100 bytes of data on the stack. Either 25 integers worth of data, or an array of some sort. We examine the rest of the function, and see what kind of data it is:

call _gets
push eax
push esp
call _strcpy
...

Clearly we are accessing the data on the stack as an array, specifically an array of chars. The above assembly code fragment gets a text string from the console, and copies that data into the local variable on the stack.

Unfortunately the standard C library string functions we are using have a well-known vulnerability: they do not check the bounds of the input arguments. In fact, the <string.h> functions rarely even ask the programmer to supply the size of an array, or the maximum available memory size!

Some of the most common stack vulnerabilities stem from this fact. Offenders to look out for are strcpy, strcat and sprintf, functions whose output string arguments can be larger then the supplied buffer to hold them.

The local variable is only 100 chars (1 char = 1 byte) wide. What happens if we input a string 100 characters long? Remember, ASCIIZ strings are terminated by a null char (00h), that requires an extra slot from the array. That means that the 101st char will be a null byte, and the saved value for ebp will be lost. Now imagine what would happen if we input 104 characters, or even 108 (enough to overwrite the return address). An attacker that inputs just the right values can redirect program execution to a malicious function that may help take over the computer.

Further Reading[edit | edit source]

"Smashing The Stack For Fun And Profit", Aleph One, Phrack, 7(49), November 1996.