x86 Disassembly/Windows Executable Files
MS-DOS COM Files
[edit | edit source]COM files are loaded into RAM exactly as they appear; no change is made at all from the harddisk image to RAM. This is possible due to the segmented memory model of the early x86 line. Two 16-bit registers determine the actual address used for a memory access, a “segment” register specifying a 64K byte window into the 1M+64K byte space (in 16-byte increments) and an “offset” specifying an offset into that window. The segment register would be set by DOS and the COM file would be expected to respect this setting and not ever change the segment registers. The offset registers, however, were fair game and served (for COM files) the same purpose as a modern 32-bit register. The downside was that the offset registers were only 16-bit and, therefore, since COM files could not change the segment registers, COM files were limited to using 64K of RAM. The good thing about this approach, however, was that no extra work was needed by DOS to load and run a COM file: just load the file, set the segment register, and jump to it. (The programs could perform 'near' jumps by just giving an offset to jump to.)
COM files are loaded into RAM at offset $100. The space before that would be used for passing data to and from DOS (for example, the contents of the command line used to invoke the program).
Note that COM files, by definition, cannot be 32-bit. Windows provides support for COM files via a special CPU mode.
Notice that MS-DOS COM files (short for "command" files) are not the same as Component-Object Model files, which are an object-oriented library technology. |
MS-DOS EXE Files
[edit | edit source]One way MS-DOS compilers got around the 64K memory limitation was with the introduction of memory models. The basic concept is to cleverly set different segment registers in the x86 CPU (CS, DS, ES, SS) to point to the same or different segments, thus allowing varying degrees of access to memory. Typical memory models were:
- tiny
- All memory accesses are 16-bit (segment registers unchanged). Produces a .COM file instead of an .EXE file.
- small
- All memory accesses are 16-bit (segment registers unchanged).
- compact
- Data addresses include both segment and offset, reloading the DS or ES registers on access and allowing up to 1M of data. Code accesses don't change the CS register, allowing 64K of code.
- medium
- Code addresses include the segment address, reloading CS on access and allowing up to 1M of code. Data accesses don't change the DS and ES registers, allowing 64K of data.
- large
- Both code and data addresses are (segment, offset) pairs, always reloading the segment addresses. The whole 1M byte memory space is available for both code and data.
- huge
- Same as the large model, with additional arithmetic being generated by the compiler to allow access to arrays larger than 64K.
When looking at an EXE file, one has to decide which memory model was used to build that file.
PE Files
[edit | edit source]A Portable Executable (PE) file is the standard binary file format for an Executable or DLL under Windows NT, Windows 95, and Win32. The Win32 SDK contains a file, winnt.h, which declares various structs and variables used in the PE files. Some functions for manipulating PE files are also included in imagehlp.dll. PE files are broken down into various sections which can be examined.
Relative Virtual Addressing (RVA)
[edit | edit source]In a Windows environment, executable modules can be loaded at any point in memory, and are expected to run without problem. To allow multiple programs to be loaded at seemingly random locations in memory, PE files have adopted a tool called RVA: Relative Virtual Addresses. RVAs assume that the "base address" of where a module is loaded into memory is not known at compile time. So, PE files describe the location of data in memory as an offset from the base address, wherever that may be in memory.
Some processor instructions require the code itself to directly identify where in memory some data is. This is not possible when the location of the module in memory is not known at compile time. The solution to this problem is described in the section on "Relocations".
It is important to remember that the addresses obtained from a disassembly of a module will not always match up to the addresses seen in a debugger as the program is running.
File Format
[edit | edit source]The PE portable executable file format includes a number of informational headers, and is arranged in the following format:
The basic format of a Microsoft PE file
MS-DOS header
[edit | edit source]Open any Win32 binary executable in a hex editor, and note what you see: The first 2 letters are always the letters "MZ", the initials of Mark Zbikowski, who created the first linker for DOS. To some people, the first few bytes in a file that determine the type of file are called the "magic number," although there is no rule that states that the "magic number" needs to be a single number. Instead, we will use the term "File ID Tag", or simply, File ID. Sometimes this is also known as File Signature.
After the File ID, the hex editor will show several bytes of either random-looking symbols, or whitespace, before the human-readable string "This program cannot be run in DOS mode".
What is this?
Hex Listing of an MS-DOS file header
What you are looking at is the MS-DOS header of the Win32 PE file. To ensure either a) backwards compatibility, or b) graceful decline of new file types, Microsoft has written a series of machine instructions(an example program is listed below the DOS header structure) into the head of each PE file. When a 32-bit Windows file is run in a 16-bit DOS environment, the program will display the error message: "This program cannot be run in DOS mode.", then terminate.
The DOS header is also known by some as the EXE header. Here is the DOS header presented as a C data structure:
struct DOS_Header
{
// short is 2 bytes, long is 4 bytes
char signature[2] = { 'M', 'Z' };
short lastsize;
short nblocks;
short nreloc;
short hdrsize;
short minalloc;
short maxalloc;
void *ss; // 2 byte value
void *sp; // 2 byte value
short checksum;
void *ip; // 2 byte value
void *cs; // 2 byte value
short relocpos;
short noverlay;
short reserved1[4];
short oem_id;
short oem_info;
short reserved2[10];
long e_lfanew; // Offset to the 'PE\0\0' signature relative to the beginning of the file
}
After the DOS header there is a stub program mentioned in the paragraph above the DOS header structure. Listed below is a commented example of that program, it was taken from a program compiled with GCC.
;# Using NASM with Intel syntax
push cs ;# Push CS onto the stack
pop ds ;# Set DS to CS
mov dx,message ; point to our message "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'
mov ah, 09
int 0x21 ;# when AH = 9, DOS interrupt to write a string
;# terminate the program
mov ax,0x4c01
int 0x21
message db "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'
PE Header
[edit | edit source]At offset 60 (0x3C) from the beginning of the DOS header is a pointer to the Portable Executable (PE) File header (e_lfanew in MZ structure). DOS will print the error message and terminate, but Windows will follow this pointer to the next batch of information.
Hex Listing of a PE signature, and the pointer to it
The PE header consists only of a File ID signature, with the value "PE\0\0" where each '\0' character is an ASCII NUL character. This signature indicates a) that this file is a legitimate PE file, and b) the byte order of the file. Byte order will not be considered in this chapter, and all PE files are assumed to be in "little endian" format.
The first big chunk of information lies in the COFF header, directly after the PE signature.
COFF Header
[edit | edit source]The COFF header is present in both COFF object files (before they are linked) and in PE files where it is known as the "File header". The COFF header has some information that is useful to an executable, and some information that is more useful to an object file.
Here is the COFF header, presented as a C data structure:
struct COFFHeader
{
short Machine;
short NumberOfSections;
long TimeDateStamp;
long PointerToSymbolTable;
long NumberOfSymbols;
short SizeOfOptionalHeader;
short Characteristics;
}
- Machine
- This field determines what machine the file was compiled for. A hex value of 0x14C (332 in decimal) is the code for an Intel 80386.
Here's a list of possible values it can have.
Value | Description |
0x14c | Intel 386 |
0x8664 | x64 |
0x162 | MIPS R3000 |
0x168 | MIPS R10000 |
0x169 | MIPS little endian WCI v2 |
0x183 | old Alpha AXP |
0x184 | Alpha AXP |
0x1a2 | Hitachi SH3 |
0x1a3 | Hitachi SH3 DSP |
0x1a6 | Hitachi SH4 |
0x1a8 | Hitachi SH5 |
0x1c0 | ARM little endian |
0x1c2 | Thumb |
0x1c4 | ARMv7 (Thumb-2) |
0x1d3 | Matsushita AM33 |
0x1f0 | PowerPC little endian |
0x1f1 | PowerPC with floating point support |
0x1f2 | PowerPC 64-bit little endian |
0x200 | Intel IA64 |
0x266 | MIPS16 |
0x268 | Motorola 68000 series |
0x284 | Alpha AXP 64-bit |
0x366 | MIPS with FPU |
0x466 | MIPS16 with FPU |
0xebc | EFI Byte Code |
0x8664 | AMD AMD64 |
0x9041 | Mitsubishi M32R little endian |
0xaa64 | ARM64 little endian |
0xc0ee | clr pure MSIL |
- NumberOfSections
- The number of sections that are described at the end of the PE headers.
- TimeDateStamp
- 32 bit time at which this header was generated: is used in the process of "Binding", see below.
- SizeOfOptionalHeader
- this field shows how long the "PE Optional Header" is that follows the COFF header.
- Characteristics
- This is a field of bit flags, that show some characteristics of the file.
Constant Name | Bit Position / Mask | Description |
IMAGE_FILE_RELOCS_STRIPPED | 1 / 0x0001 | Relocation information was stripped from file |
IMAGE_FILE_EXECUTABLE_IMAGE | 2 / 0x0002 | The file is executable |
IMAGE_FILE_LINE_NUMS_STRIPPED | 3 / 0x0004 | COFF line numbers were stripped from file |
IMAGE_FILE_LOCAL_SYMS_STRIPPED | 4 / 0x0008 | COFF symbol table entries were stripped from file |
IMAGE_FILE_AGGRESIVE_WS_TRIM | 5 / 0x0010 | Aggressively trim the working set(obsolete) |
IMAGE_FILE_LARGE_ADDRESS_AWARE | 6 / 0x0020 | The application can handle addresses greater than 2 GB |
IMAGE_FILE_BYTES_REVERSED_LO | 8 / 0x0080 | The bytes of the word are reversed(obsolete) |
IMAGE_FILE_32BIT_MACHINE | 9 / 0x0100 | The computer supports 32-bit words |
IMAGE_FILE_DEBUG_STRIPPED | 10 / 0x0200 | Debugging information was removed and stored separately in another file |
IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP | 11 / 0x0400 | If the image is on removable media, copy it to and run it from the swap file |
IMAGE_FILE_NET_RUN_FROM_SWAP | 12 / 0x0800 | If the image is on the network, copy it to and run it from the swap file |
IMAGE_FILE_SYSTEM | 13 / 0x1000 | The image is a system file |
IMAGE_FILE_DLL | 14 / 0x2000 | The image is a DLL file |
IMAGE_FILE_UP_SYSTEM_ONLY | 15 / 0x4000 | The image should only be ran on a single processor computer |
IMAGE_FILE_BYTES_REVERSED_HI | 16 / 0x8000 | The bytes of the word are reversed(obsolete) |
PE Optional Header
[edit | edit source]The "PE Optional Header" is not "optional" per se, because it is required in Executable files, but not in COFF object files. There are two different versions of the optional header depending on whether or not the file is 64 bit or 32 bit. The Optional header includes lots and lots of information that can be used to pick apart the file structure, and obtain some useful information about it.
The PE Optional Header occurs directly after the COFF header, and some sources even show the two headers as being part of the same structure. This wikibook separates them out for convenience.
Here is the 64 bit PE Optional Header presented as a C data structure:
struct PEOptHeader
{
/* 64 bit version of the PE Optional Header also known as IMAGE_OPTIONAL_HEADER64
char is 1 byte
short is 2 bytes
long is 4 bytes
long long is 8 bytes
*/
short signature; //decimal number 267 for 32 bit, 523 for 64 bit, and 263 for a ROM image.
char MajorLinkerVersion;
char MinorLinkerVersion;
long SizeOfCode;
long SizeOfInitializedData;
long SizeOfUninitializedData;
long AddressOfEntryPoint; //The RVA of the code entry point
long BaseOfCode;
/*The next 21 fields are an extension to the COFF optional header format*/
long long ImageBase;
long SectionAlignment;
long FileAlignment;
short MajorOSVersion;
short MinorOSVersion;
short MajorImageVersion;
short MinorImageVersion;
short MajorSubsystemVersion;
short MinorSubsystemVersion;
long Win32VersionValue;
long SizeOfImage;
long SizeOfHeaders;
long Checksum;
short Subsystem;
short DLLCharacteristics;
long long SizeOfStackReserve;
long long SizeOfStackCommit;
long long SizeOfHeapReserve;
long long SizeOfHeapCommit;
long LoaderFlags;
long NumberOfRvaAndSizes;
data_directory DataDirectory[NumberOfRvaAndSizes]; //Can have any number of elements, matching the number in NumberOfRvaAndSizes.
} //However, it is always 16 in PE files.
This is the 32 bit version of the PE Optional Header presented as a C data structure:
struct PEOptHeader
{
/* 32 bit version of the PE Optional Header also known as IMAGE_OPTIONAL_HEADER
char is 1 byte
short is 2 bytes
long is 4 bytes
*/
short signature; //decimal number 267 for 32 bit, 523 for 64 bit, and 263 for a ROM image.
char MajorLinkerVersion;
char MinorLinkerVersion;
long SizeOfCode;
long SizeOfInitializedData;
long SizeOfUninitializedData;
long AddressOfEntryPoint; //The RVA of the code entry point
long BaseOfCode;
long BaseOfData;
/*The next 21 fields are an extension to the COFF optional header format*/
long ImageBase;
long SectionAlignment;
long FileAlignment;
short MajorOSVersion;
short MinorOSVersion;
short MajorImageVersion;
short MinorImageVersion;
short MajorSubsystemVersion;
short MinorSubsystemVersion;
long Win32VersionValue;
long SizeOfImage;
long SizeOfHeaders;
long Checksum;
short Subsystem;
short DLLCharacteristics;
long SizeOfStackReserve;
long SizeOfStackCommit;
long SizeOfHeapReserve;
long SizeOfHeapCommit;
long LoaderFlags;
long NumberOfRvaAndSizes;
data_directory DataDirectory[NumberOfRvaAndSizes]; //Can have any number of elements, matching the number in NumberOfRvaAndSizes.
} //However, it is always 16 in PE files.
This is the data_directory(also known as IMAGE_DATA_DIRECTORY) structure as found in the two structures above:
/*
long is 4 bytes
*/
struct data_directory
{
long VirtualAddress;
long Size;
}
- Signature
- Contains a signature that identifies the image.
Constant Name | Value | Description |
---|---|---|
IMAGE_NT_OPTIONAL_HDR32_MAGIC | 0x10b | 32 bit executable image. |
IMAGE_NT_OPTIONAL_HDR64_MAGIC | 0x20b | 64 bit executable image |
IMAGE_ROM_OPTIONAL_HDR_MAGIC | 0x107 | ROM image |
- MajorLinkerVersion
- The major version number of the linker.
- MinorLinkerVersion
- The minor version number of the linker.
- SizeOfCode
- The size of the code section, in bytes, or the sum of all such sections if there are multiple code sections.
- SizeOfInitializedData
- The size of the initialized data section, in bytes, or the sum of all such sections if there are multiple initialized data sections.
- SizeOfUninitializedData
- The size of the uninitialized data section, in bytes, or the sum of all such sections if there are multiple uninitialized data sections.
- AddressOfEntryPoint
- A pointer to the entry point function, relative to the image base address. For executable files, this is the starting address. For device drivers, this is the address of the initialization function. The entry point function is optional for DLLs. When no entry point is present, this member is zero.
- BaseOfCode
- A pointer to the beginning of the code section, relative to the image base.
- BaseOfData
- A pointer to the beginning of the data section, relative to the image base.
- ImageBase
- The preferred address of the first byte of the image when it is loaded in memory. This value is a multiple of 64K bytes. The default value for DLLs is 0x10000000. The default value for applications is 0x00400000, except on Windows CE where it is 0x00010000.
- SectionAlignment
- The alignment of sections loaded in memory, in bytes. This value must be greater than or equal to the FileAlignment member. The default value is the page size for the system.
- FileAlignment
- The alignment of the raw data of sections in the image file, in bytes. The value should be a power of 2 between 512 and 64K (inclusive). The default is 512. If the SectionAlignment member is less than the system page size, this member must be the same as SectionAlignment.
- MajorOSVersion
- The major version number of the required operating system.
- MinorOSVersion
- The minor version number of the required operating system.
- MajorImageVersion
- The major version number of the image.
- MinorImageVersion
- The minor version number of the image.
- MajorSubsystemVersion
- The major version number of the subsystem.
- MinorSubsystemVersion
- The minor version number of the subsystem.
- Win32VersionValue
- This member is reserved and must be 0.
- SizeOfImage
- The size of the image, in bytes, including all headers. Must be a multiple of SectionAlignment.
- SizeOfHeaders
- The combined size of the following items, rounded to a multiple of the value specified in the FileAlignment member.
- e_lfanew member of DOS_Header
- 4 byte signature
- size of COFFHeader
- size of optional header
- size of all section headers
- CheckSum
- The image file checksum. The following files are validated at load time: all drivers, any DLL loaded at boot time, and any DLL loaded into a critical system process.
- Subsystem
- The Subsystem that will be invoked to run the executable
Constant Name | Value | Description |
---|---|---|
IMAGE_SUBSYSTEM_UNKNOWN | 0 | Unknown subsystem |
IMAGE_SUBSYSTEM_NATIVE | 1 | No subsystem required (device drivers and native system processes) |
IMAGE_SUBSYSTEM_WINDOWS_GUI | 2 | Windows graphical user interface (GUI) subsystem |
IMAGE_SUBSYSTEM_WINDOWS_CUI | 3 | Windows character-mode user interface (CUI) subsystem |
IMAGE_SUBSYSTEM_OS2_CUI | 5 | OS/2 CUI subsystem |
IMAGE_SUBSYSTEM_POSIX_CUI | 7 | POSIX CUI subsystem |
IMAGE_SUBSYSTEM_WINDOWS_CE_GUI | 9 | Windows CE system |
IMAGE_SUBSYSTEM_EFI_APPLICATION | 10 | Extensible Firmware Interface (EFI) application |
IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER | 11 | EFI driver with boot services |
IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER | 12 | EFI driver with run-time services |
IMAGE_SUBSYSTEM_EFI_ROM | 13 | EFI ROM image |
IMAGE_SUBSYSTEM_XBOX | 14 | Xbox system |
IMAGE_SUBSYSTEM_WINDOWS_BOOT_APPLICATION | 16 | Boot application |
- DLLCharacteristics
- The DLL characteristics of the image
Constant Name | Value | Description |
---|---|---|
No constant name | 0x0001 | Reserved |
No constant name | 0x0002 | Reserved |
No constant name | 0x0004 | Reserved |
No constant name | 0x0008 | Reserved |
IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE | 0x0040 | The DLL can be relocated at load time |
IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY | 0x0080 | Code integrity checks are forced |
IMAGE_DLLCHARACTERISTICS_NX_COMPAT | 0x0100 | The image is compatible with data execution prevention (DEP) |
IMAGE_DLLCHARACTERISTICS_NO_ISOLATION | 0x0200 | The image is isolation aware, but should not be isolated |
IMAGE_DLLCHARACTERISTICS_NO_SEH | 0x0400 | The image does not use structured exception handling (SEH). No handlers can be called in this image |
IMAGE_DLLCHARACTERISTICS_NO_BIND | 0x0800 | Do not bind the image |
IMAGE_DLLCHARACTERISTICS_APPCONTAINER | 0x1000 | The image must be executed within an App container |
IMAGE_DLLCHARACTERISTICS_WDM_DRIVER | 0x2000 | A WDM driver |
No constant name | 0x4000 | Reserved |
IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE | 0x8000 | The image is terminal server aware |
- SizeOfStackReserve
- The number of bytes to reserve for the stack. Only the memory specified by the SizeOfStackCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
- SizeOfStackCommit
- The number of bytes to commit for the stack.
- SizeOfHeapReserve
- The number of bytes to reserve for the local heap. Only the memory specified by the SizeOfHeapCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
- SizeOfHeapCommit
- The number of bytes to commit for the local heap.
- LoaderFlags
- This member is obsolete.
- NumberOfRvaAndSizes
- The number of directory entries in the remainder of the optional header. Each entry describes a location and size.
- DataDirectory
- Possibly the most interesting member of this structure. Provides RVAs and sizes which locate various data structures, which are used for setting up the execution environment of a module. The data structures that the array of DataDirectory points to can be found in the various sections of the file as pointed to by the Section Table. The details of what these structures do exist in other sections of this page. The most interesting entries in DataDirectory are as follows: Export Directory, Import Directory, Resource Directory, and the Bound Import directory. The .NET descriptor table (CLI Header) contains the metadata of the .NET assembly, the table is an IMAGE_COR20_HEADER structure, which is defined in winnt.h. Note that the offsets in bytes are relative to the beginning of the optional header.
Constant Name | Value | Description | Offset PE(32 bit) | Offset PE32+(64 bit) |
---|---|---|---|---|
IMAGE_DIRECTORY_ENTRY_EXPORT | 0 | Export Directory | 96 | 112 |
IMAGE_DIRECTORY_ENTRY_IMPORT | 1 | Import Directory | 104 | 120 |
IMAGE_DIRECTORY_ENTRY_RESOURCE | 2 | Resource Directory | 112 | 128 |
IMAGE_DIRECTORY_ENTRY_EXCEPTION | 3 | Exception Directory | 120 | 136 |
IMAGE_DIRECTORY_ENTRY_SECURITY | 4 | Security Directory | 128 | 144 |
IMAGE_DIRECTORY_ENTRY_BASERELOC | 5 | Base Relocation Table | 136 | 152 |
IMAGE_DIRECTORY_ENTRY_DEBUG | 6 | Debug Directory | 144 | 160 |
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE | 7 | Architecture specific data | 152 | 168 |
IMAGE_DIRECTORY_ENTRY_GLOBALPTR | 8 | Global pointer register relative virtual address | 160 | 176 |
IMAGE_DIRECTORY_ENTRY_TLS | 9 | Thread Local Storage directory | 168 | 184 |
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG | 10 | Load Configuration directory | 176 | 192 |
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT | 11 | Bound Import directory | 184 | 200 |
IMAGE_DIRECTORY_ENTRY_IAT | 12 | Import Address Table | 192 | 208 |
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT | 13 | Delay Import table | 200 | 216 |
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR | 14 | COM or .net descriptor table (CLI Header) | 208 | 224 |
No constant name | 15 | Reserved | 216 | 232 |
Section Table
[edit | edit source]Immediately after the PE Optional Header we find a section table. The section table consists of an array of IMAGE_SECTION_HEADER structures. The number of structures that we find in the file are determined by the member NumberOfSections in the COFF Header. Each structure is 40 bytes in length. Pictured below is a hex dump from a program I am writing depicting the section table:
The outlined areas correlate to the Name member of three IMAGE_SECTION_HEADER structures
The IMAGE_SECTION_HEADER defined as a C structure is as follows:
struct IMAGE_SECTION_HEADER
{
// short is 2 bytes
// long is 4 bytes
char Name[IMAGE_SIZEOF_SHORT_NAME]; // IMAGE_SIZEOF_SHORT_NAME is 8 bytes
union {
long PhysicalAddress;
long VirtualSize;
} Misc;
long VirtualAddress;
long SizeOfRawData;
long PointerToRawData;
long PointerToRelocations;
long PointerToLinenumbers;
short NumberOfRelocations;
short NumberOfLinenumbers;
long Characteristics;
}
- Name
- 8-byte null-padded UTF-8 string(the string may not be null terminated if all 8 characters are used). For longer names, this member will contain '/' followed by an ASCII representation of a decimal number that is an offset into the string table. Executable images do not use a string table and do not support section names greater than 8 characters.
- Misc
- PhysicalAddress - The file address.
- VirtualSize - The total size of the section when loaded into memory, in bytes. If this value is greater than the SizeOfRawData member, the section is filled with zeroes. This field is valid only for executable images and should be set to 0 for object files.
- The Misc member should be considered unreliable unless the linker and the behavior of the linker that was used is known.
- VirtualAddress
- The address of the first byte of the section when loaded into memory, relative to the image base. For object files, this is the address of the first byte before relocation is applied.
- SizeOfRawData
- The size of the initialized data on disk, in bytes. This value must be a multiple of the FileAlignment member of the PE Optional Header structure. If this value is less than the VirtualSize member, the remainder of the section is filled with zeroes. If the section contains only uninitialized data, the value is 0.
- PointerToRawData
- A file pointer relative to the beginning of the file to the first page within the COFF file. This value must be a multiple of the FileAlignment member of the PE Optional Header structure. If a section contains only uninitialized data this value should be 0.
- PointerToRelocations
- A file pointer to the beginning of the relocation entries for the section. If there are no relocations, this value is 0.
- PointerToLinenumbers
- A file pointer to the beginning of the line-number entries for the section. If there are no COFF line numbers, this value is 0.
- NumberOfRelocations
- The number of relocation entries for the section. This value is 0 for executable images.
- NumberOfLinenumbers
- The number of line-number entries for the section.
- Characteristics
- The characteristics of the image.
The table below defines the possible 32-bit mask values for this member:
Constant Name | Value | Description |
---|---|---|
No Constant Name | 0x00000000 | Reserved |
No Constant Name | 0x00000001 | Reserved |
No Constant Name | 0x00000002 | Reserved |
No Constant Name | 0x00000004 | Reserved |
IMAGE_SCN_TYPE_NO_PAD | 0x00000008 | The section should not be padded to the next boundary. This flag is obsolete and is replaced by IMAGE_SCN_ALIGN_1BYTES |
No Constant Name | 0x00000010 | Reserved |
IMAGE_SCN_CNT_CODE | 0x00000020 | The section contains executable code (.text section) |
IMAGE_SCN_CNT_INITIALIZED_DATA | 0x00000040 | The section contains initialized data |
IMAGE_SCN_CNT_UNINITIALIZED_DATA | 0x00000080 | The section contains uninitialized data |
IMAGE_SCN_LNK_OTHER | 0x00000100 | Reserved |
IMAGE_SCN_LNK_INFO | 0x00000200 | The section contains comments or other information. This is valid only for object files (.drectve section) |
No Constant Name | 0x00000400 | Reserved |
IMAGE_SCN_LNK_REMOVE | 0x00000800 | The section will not become part of the image. This is valid only for object files |
IMAGE_SCN_LNK_COMDAT | 0x00001000 | The section contains COMDAT data. This is valid only for object files |
No Constant Name | 0x00002000 | Reserved |
IMAGE_SCN_NO_DEFER_SPEC_EXC | 0x00004000 | Reset speculative exceptions handling bits in the TLB entries for this section |
IMAGE_SCN_GPREL | 0x00008000 | The section contains data referenced through the global pointer |
No Constant Name | 0x00010000 | Reserved |
IMAGE_SCN_MEM_PURGEABLE | 0x00020000 | Reserved |
IMAGE_SCN_MEM_LOCKED | 0x00040000 | Reserved |
IMAGE_SCN_MEM_PRELOAD | 0x00080000 | Reserved |
IMAGE_SCN_ALIGN_1BYTES | 0x00100000 | Align data on a 1-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_2BYTES | 0x00200000 | Align data on a 2-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_4BYTES | 0x00300000 | Align data on a 4-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_8BYTES | 0x00400000 | Align data on a 8-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_16BYTES | 0x00500000 | Align data on a 16-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_32BYTES | 0x00600000 | Align data on a 32-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_64BYTES | 0x00700000 | Align data on a 64-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_128BYTES | 0x00800000 | Align data on a 128-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_256BYTES | 0x00900000 | Align data on a 256-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_512BYTES | 0x00A00000 | Align data on a 512-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_1024BYTES | 0x00B00000 | Align data on a 1024-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_2048BYTES | 0x00C00000 | Align data on a 2048-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_4096BYTES | 0x00D00000 | Align data on a 4096-byte boundary. This is valid only for object files |
IMAGE_SCN_ALIGN_8192BYTES | 0x00E00000 | Align data on a 8192-byte boundary. This is valid only for object files |
IMAGE_SCN_LNK_NRELOC_OVFL | 0x01000000 | The section contains extended relocations. The count of relocations for the section exceeds the 16 bits that is reserved for it in the section header. If the NumberOfRelocations field in the section header is 0xffff, the actual relocation count is stored in the VirtualAddress field of the first relocation. It is an error if IMAGE_SCN_LNK_NRELOC_OVFL is set and there are fewer than 0xffff relocations in the section |
IMAGE_SCN_MEM_DISCARDABLE | 0x02000000 | The section can be discarded as needed |
IMAGE_SCN_MEM_NOT_CACHED | 0x04000000 | The section cannot be cached |
IMAGE_SCN_MEM_NOT_PAGED | 0x08000000 | The section cannot be paged |
IMAGE_SCN_MEM_SHARED | 0x10000000 | The section can be shared in memory |
IMAGE_SCN_MEM_EXECUTE | 0x20000000 | The section can be executed as code (.text, etc. section) |
IMAGE_SCN_MEM_READ | 0x40000000 | The section can be read |
IMAGE_SCN_MEM_WRITE | 0x80000000 | The section can be written to |
A PE loader will place the sections of the executable image at the locations specified by these section descriptors (relative to the base address) and usually the alignment is 0x1000, which matches the size of pages on the x86.
Common sections are:
- .text/.code/CODE/TEXT - Contains executable code (machine instructions)
- .textbss/TEXTBSS - Present if Incremental Linking is enabled
- .data/.idata/DATA/IDATA - Contains initialised data
- .bss/BSS - Contains uninitialised data
- .rsrc - Contains resource data
Imports and Exports - Linking to other modules
[edit | edit source]What is linking?
[edit | edit source]Whenever a developer writes a program, there are a number of subroutines and functions which are expected to be implemented already, saving the writer the hassle of having to write out more code or work with complex data structures. Instead, the coder need only declare one call to the subroutine, and the linker will decide what happens next.
There are two types of linking that can be used: static and dynamic. Static uses a library of precompiled functions. This precompiled code can be inserted into the final executable to implement a function, saving the programmer a lot of time. In contrast, dynamic linking allows subroutine code to reside in a different file (or module), which is loaded at runtime by the operating system. This is also known as a "Dynamically linked library", or DLL. A library is a module containing a series of functions or values that can be exported. This is different from the term executable, which imports things from libraries to do what it wants. From here on, "module" means any file of PE format, and a "Library" is any module which exports and imports functions and values.
Dynamically linking has the following benefits:
- It saves disk space, if more than one executable links to the library module
- Allows instant updating of routines, without providing new executables for all applications
- Can save space in memory by mapping the code of a library into more than one process
- Increases abstraction of implementation. The method by which an action is achieved can be modified without the need for reprogramming of applications. This is extremely useful for backward compatibility with operating systems.
This section discusses how this is achieved using the PE file format. An important point to note at this point is that anything can be imported or exported between modules, including variables as well as subroutines.
Loading
[edit | edit source]The downside of dynamically linking modules together is that, at runtime, the software which is initialising an executable must link these modules together. For various reasons, you cannot declare that "The function in this dynamic library will always exist in memory here". If that memory address is unavailable or the library is updated, the function will no longer exist there, and the application trying to use it will break. Instead, each module (library or executable) must declare what functions or values it exports to other modules, and also what it wishes to import from other modules.
As said above, a module cannot declare where in memory it expects a function or value to be. Instead, it declares where in its own memory it expects to find a pointer to the value it wishes to import. This permits the module to address any imported value, wherever it turns up in memory.
Exports
[edit | edit source]Exports are functions and values in one module that have been declared to be shared with other modules. This is done through the use of the "Export Directory", which is used to translate between the name of an export (or "Ordinal", see below), and a location in memory where the code or data can be found. The start of the export directory is identified by the IMAGE_DIRECTORY_ENTRY_EXPORT entry of the resource directory. All export data must exist in the same section. The directory is headed by the following structure:
struct IMAGE_EXPORT_DIRECTORY {
long Characteristics;
long TimeDateStamp;
short MajorVersion;
short MinorVersion;
long Name;
long Base;
long NumberOfFunctions;
long NumberOfNames;
long *AddressOfFunctions;
long *AddressOfNames;
long *AddressOfNameOrdinals;
}
The "Characteristics" value is generally unused, TimeDateStamp describes the time the export directory was generated, MajorVersion and MinorVersion should describe the version details of the directory, but their nature is undefined. These values have little or no impact on the actual exports themselves. The "Name" value is an RVA to a zero terminated ASCII string, the name of this library name, or module.
Names and Ordinals
[edit | edit source]Each exported value has both a name and an "ordinal" (a kind of index). The actual exports themselves are described through AddressOfFunctions, which is an RVA to an array of RVAs, each pointing to a different function or value to be exported. The size of this array is in the value NumberOfFunctions. Each of these functions has an ordinal. The "Base" value is used as the ordinal of the first export, and the next RVA in the array is Base+1, and so forth.
Each entry in the AddressOfFunctions array is identified by a name, found through the RVA AddressOfNames. The data where AddressOfNames points to is an array of RVAs, of the size NumberOfNames. Each RVA points to a zero terminated ASCII string, each being the name of an export. There is also a second array, pointed to by the RVA in AddressOfNameOrdinals. This is also of size NumberOfNames, but each value is a 16 bit word, each value being an ordinal. These two arrays are parallel and are used to get an export value from AddressOfFunctions. To find an export by name, search the AddressOfNames array for the correct string and then take the corresponding value from the AddressOfNameOrdinals array. This value is then used as index to AddressOfFunctions (yes, it's 0-based index actually, NOT base-biased ordinal, as the official documentation suggests!).
Forwarding
[edit | edit source]As well as being able to export functions and values in a module, the export directory can forward an export to another library. This allows more flexibility when re-organising libraries: perhaps some functionality has branched into another module. If so, an export can be forwarded to that library, instead of messy reorganising inside the original module.
Forwarding is achieved by making an RVA in the AddressOfFunctions array point into the section which contains the export directory, something that normal exports should not do. At that location, there should be a zero terminated ASCII string of format "LibraryName.ExportName" for the appropriate place to forward this export to.
Imports
[edit | edit source]The other half of dynamic linking is importing functions and values into an executable or other module. Before runtime, compilers and linkers do not know where in memory a value that needs to be imported could exist. The import table solves this by creating an array of pointers at runtime, each one pointing to the memory location of an imported value. This array of pointers exists inside of the module at a defined RVA location. In this way, the linker can use addresses inside of the module to access values outside of it.
The Import directory
[edit | edit source]The start of the import directory is pointed to by both the IMAGE_DIRECTORY_ENTRY_IAT and IMAGE_DIRECTORY_ENTRY_IMPORT entries of the resource directory (the reason for this is uncertain). At that location, there is an array of IMAGE_IMPORT_DESCRIPTOR structures. Each of these identify a library or module that has a value we need to import. The array continues until an entry where all the values are zero. The structure is as follows:
struct IMAGE_IMPORT_DESCRIPTOR {
long *OriginalFirstThunk;
long TimeDateStamp;
long ForwarderChain;
long Name;
long *FirstThunk;
}
The TimeDateStamp is relevant to the act of "Binding", see below. The Name value is an RVA to an ASCII string, naming the library to import. ForwarderChain will be explained later. The only thing of interest at this point, are the RVAs OriginalFirstThunk and FirstThunk. Both these values point to arrays of RVAs, each of which point to a IMAGE_IMPORT_BY_NAMES struct. The arrays are terminated with an entry that is equal to zero. These two arrays are parallel and point to the same structure, in the same order. The reason for this will become apparent shortly.
Each of these IMAGE_IMPORT_BY_NAMES structs has the following form:
struct IMAGE_IMPORT_BY_NAME {
short Hint;
char Name[1];
}
"Name" is an ASCII string of any size that names the value to be imported. This is used when looking up a value in the export directory (see above) through the AddressOfNames array. The "Hint" value is an index into the AddressOfNames array; to save searching for a string, the loader first checks the AddressOfNames entry corresponding to "Hint".
To summarise: The import table consists of a large array of IMAGE_IMPORT_DESCRIPTORs, terminated by an all-zero entry. These descriptors identify a library to import things from. There are then two parallel RVA arrays, each pointing at IMAGE_IMPORT_BY_NAME structures, which identify a specific value to be imported.
Imports at runtime
[edit | edit source]Using the above import directory at runtime, the loader finds the appropriate modules, loads them into memory, and seeks the correct export. However, to be able to use the export, a pointer to it must be stored somewhere in the importing module's memory. This is why there are two parallel arrays, OriginalFirstThunk and FirstThunk, identifying IMAGE_IMPORT_BY_NAME structures. Once an imported value has been resolved, the pointer to it is stored in the FirstThunk array. It can then be used at runtime to address imported values.
Bound imports
[edit | edit source]The PE file format also supports a peculiar feature known as "binding". The process of loading and resolving import addresses can be time consuming, and in some situations this is to be avoided. If a developer is fairly certain that a library is not going to be updated or changed, then the addresses in memory of imported values will not change each time the application is loaded. So, the import address can be precomputed and stored in the FirstThunk array before runtime, allowing the loader to skip resolving the imports - the imports are "bound" to a particular memory location. However, if the versions numbers between modules do not match, or the imported library needs to be relocated, the loader will assume the bound addresses are invalid, and resolve the imports anyway.
The "TimeDateStamp" member of the import directory entry for a module controls binding; if it is set to zero, then the import directory is not bound. If it is non-zero, then it is bound to another module. However, the TimeDateStamp in the import table must match the TimeDateStamp in the bound module's FileHeader, otherwise the bound values will be discarded by the loader.
Forwarding and binding
[edit | edit source]Binding can of course be a problem if the bound library / module forwards its exports to another module. In these cases, the non-forwarded imports can be bound, but the values which get forwarded must be identified so the loader can resolve them. This is done through the ForwarderChain member of the import descriptor. The value of "ForwarderChain" is an index into the FirstThunk and OriginalFirstThunk arrays. The OriginalFirstThunk for that index identifies the IMAGE_IMPORT_BY_NAME structure for a import that needs to be resolved, and the FirstThunk for that index is the index of another entry that needs to be resolved. This continues until the FirstThunk value is -1, indicating no more forwarded values to import.
Resources
[edit | edit source]Resources are data items in modules which are difficult to be stored or described using the chosen programming language. This requires a separate compiler or resource builder, allowing insertion of dialog boxes, icons, menus, images, and other types of resources, including arbitrary binary data. Although a number of Resource API calls can be used to retrieve resources from a PE file, we are going to look at the resources without the use of those APIs.
Locating the Resource Section
[edit | edit source]The first thing we need to do when we want to manually manipulate a file's resources is to find the resource section. To do this we need some information found in the DataDirectory array and the Section Table. We need the RVA stored in the VirtualAddress member of the DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE] structure found in the PE Optional Header. Once the RVA is known we can then lookup that RVA in the Section Table by comparing the VirtualAddress member of the DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE] structure with the VirtualAddress member of an IMAGE_SECTION_HEADER. Except in rare cases, the VirtualAddress member of the DataDirectory structure will equal the value of the VirtualAddress member of an IMAGE_SECTION_HEADER. The name member of that particular IMAGE_SECTION_HEADER will except in rare cases be named, '.rsrc'. Once the correct IMAGE_SECTION_HEADER structure is found the PointerToRawData member can be used to locate the resource section. The PointerToRawData member contains an offset from the beginning of the file that will lead you to the first byte of the resource section. The picture below depicts an example of a DataDirectory array on the left with an IMAGE_SECTION_HEADER on the right populated with the information for the resource section. We can see the highlighted line underneath of Directory Information '2 RVA: 20480 Size: 3512' has a VirtualAddress(RVA) of 20480, this corresponds to the VirtualAddress of 20480 for the .rsrc(resource) section. You can also see that the value of PointerToRawData is equal to 7168. In this particular PE file we will find the resource section starting at offset 7168 from the beginning of the file.
Once the resource section is found we can start looking at the structures and data contained in that section.
Resource structures
[edit | edit source]The IMAGE_RESOURCE_DIRECTORY is the very first structure we come across and it starts at the 1st byte of the resource section.
IMAGE_RESOURCE_DIRECTORY structure:
struct IMAGE_RESOURCE_DIRECTORY
{
long Characteristics;
long TimeDateStamp;
short MajorVersion;
short MinorVersion;
short NumberOfNamedEntries;
short NumberOfIdEntries;
}
Characteristics is unused, and TimeDateStamp is normally the time of creation, although it doesn't matter if it's set or not. MajorVersion and MinorVersion relate to the versioning info of the resources: the fields have no defined values. Immediately following the IMAGE_RESOURCE_DIRECTORY structure is a series of IMAGE_RESOURCE_DIRECTORY_ENTRYs, the number of which are defined by the total of NumberOfNamedEntries and NumberOfIdEntries. The first portion of these entries are for named resources, the latter for ID resources, depending on the values in the IMAGE_RESOURCE_DIRECTORY struct. The actual shape of the resource entry structure is as follows:
struct IMAGE_RESOURCE_DIRECTORY_ENTRY
{
long NameId;
long *Data;
}
The NameId value has dual purpose: if the most significant bit (or sign bit) is clear, then the lower 16 bits are an ID number of the resource. Alternately, if the top bit is set, then the lower 31 bits make up an offset from the start of the resource data to the name string of this particular resource. The Data value also has a dual purpose: if the most significant bit is set, the remaining 31 bits form an offset from the start of the resource data to another IMAGE_RESOURCE_DIRECTORY (i.e. this entry is an interior node of the resource tree). Otherwise, this is a leaf node, and Data contains the offset from the start of the resource data to a structure which describes the specifics of the resource data itself (which can be considered to be an ordered stream of bytes):
struct IMAGE_RESOURCE_DATA_ENTRY
{
long *Data;
long Size;
long CodePage;
long Reserved;
}
The Data value contains an RVA to the actual resource data, Size is self-explanatory, and CodePage contains the Unicode codepage to be used for decoding Unicode-encoded strings in the resource (if any). Reserved should be set to 0.
Layout
[edit | edit source]The above system of resource directory and entries allows simple storage of resources, by name or ID number. However, this can get very complicated very quickly. Different types of resources, the resources themselves, and instances of resources in other languages can become muddled in just one directory of resources. For this reason, the resource directory has been given a structure to work by, allowing separation of the different resources.
For this purpose, the "Data" value of resource entries points at another IMAGE_RESOURCE_DIRECTORY structure, forming a tree-diagram like organisation of resources. The first level of resource entries identifies the type of the resource: cursors, bitmaps, icons and similar. They use the ID method of identifying the resource entries, of which there are twelve defined values in total. More user defined resource types can be added. Each of these resource entries points at a resource directory, naming the actual resources themselves. These can be of any name or value. These point at yet another resource directory, which uses ID numbers to distinguish languages, allowing different specific resources for systems using a different language. Finally, the entries in the language directory actually provide the offset to the resource data itself, the format of which is not defined by the PE specification and can be treated as an arbitrary stream of bytes.
Relocations
[edit | edit source]Alternate Bound Import Structure
[edit | edit source]Windows DLL Files
[edit | edit source]Windows DLL files are a brand of PE file with a few key differences:
- A .DLL file extension
- A
DllMain()
entry point, instead of a WinMain() or main(). - The DLL flag set in the PE header.
DLLs may be loaded in one of two ways, a) at load-time, or b) by calling the LoadModule() Win32 API function.
Function Exports
[edit | edit source]Functions are exported from a DLL file by using the following syntax:
__declspec(dllexport) void MyFunction() ...
The "__declspec" keyword here is not a C language standard, but is implemented by many compilers to set extendable, compiler-specific options for functions and variables. Microsoft C Compiler and GCC versions that run on windows allow for the __declspec keyword, and the dllexport property.
Functions may also be exported from regular .exe files, and .exe files with exported functions may be called dynamically in a similar manner to .dll files. This is a rare occurrence, however.
Identifying DLL Exports
[edit | edit source]There are several ways to determine which functions are exported by a DLL. A common approach is to use dumpbin in the following manner:
dumpbin /EXPORTS <dll file>
This will post a list of the function exports, along with their ordinal and RVA to the console.
Function Imports
[edit | edit source]In a similar manner to function exports, a program may import a function from an external DLL file. The dll file will load into the process memory when the program is started, and the function will be used like a local function. DLL imports need to be prototyped in the following manner, for the compiler and linker to recognize that the function is coming from an external library:
__declspec(dllimport) void MyFunction();
Identifying DLL Imports
[edit | edit source]If is often useful to determine which functions are imported from external libraries when examining a program. To list import files to the console, use dumpbin in the following manner:
dumpbin /IMPORTS <dll file>
You can also use depends.exe to list imported and exported functions. Depends is a a GUI tool and comes with Microsoft Platform SDK.