X86 Assembly/High-Level Languages
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Compilers
The first compilers were simply text translators that converted a high-level language into assembly language. The assembly language code was then fed into an assembler, to create the final machine code output. The GCC compiler still performs this sequence (code is compiled into assembly, and fed to the AS assembler). However, many modern compilers will skip the assembly language and create the machine code directly.
Assembly language code has the benefit that it has a one-to-one correlation with the underlying machine code. Each machine instruction is mapped directly to a single Assembly instruction. Because of this, even when a compiler directly creates the machine code, it is still possible to interface that code with an assembly language program. The important part is knowing exactly how the language implements its data structures, control structures, and functions. The method in which function calls are implemented by a high-level language compiler is called a calling convention.
[edit] C Calling Conventions
[edit] CDECL
In most C compilers, the CDECL calling convention is the de facto standard. However, the programmer can specify that a function be implemented using CDECL by pre-appending the function declaration with the keyword __cdecl. Sometimes a compiler can be instructed to override cdecl as the default calling convention, and this declaration will force the compiler not to override the default setting.
CDECL calling convention specifies a number of different requirements:
- Function arguments are passed on the stack, in right-to-left order.
- Function result is stored in EAX/AX/AL
- The function name is pre-appended with an underscore.
- The arguments are popped from the stack by the caller itself.
CDECL functions are capable of accepting variable argument lists.
[edit] STDCALL
STDCALL is the calling convention that is used when interfacing with the Win32 API on Microsoft Windows systems. STDCALL was created by Microsoft, and therefore isn't always supported by non-Microsoft compilers. STDCALL functions can be declared using the __stdcall keyword on many compilers. STDCALL has the following requirements:
- Function arguments are passed on the stack in right-to-left order.
- Function result is stored in EAX/AX/AL
- Function name is prefixed with an underscore
- Function name is suffixed with an "@" sign, followed by the number of bytes of arguments being passed to it.
- The arguments are popped from the stack by the callee (the called function).
STDCALL functions are not capable of accepting variable argument lists.
For example, the following function declaration in C:
_stdcall void MyFunction(int, int, short);
would be accessed in assembly using the following function label:
_MyFunction@12
Remember, on a 32 bit machine, passing a 16 bit argument on the stack (C "short") takes up a full 32 bits of space.
[edit] FASTCALL
FASTCALL functions can frequently be specified with the __fastcall keyword in many compilers. FASTCALL functions pass the first two arguments to the function in registers, so that the time-consuming stack operations can be avoided. FASTCALL has the following requirements:
- The first 32-bit (or smaller) argument is passed in ECX/CX/CL (see [1])
- The second 32-bit (or smaller) argument is passed in EDX/DX/DL
- The remaining function arguments (if any) are passed on the stack in right-to-left order
- The function result is returned in EAX/AX/AL
- The function name is prefixed with an "@" symbol
- The function name is suffixed with an "@" symbol, followed by the size of passed arguments, in bytes.
[edit] C++ Calling Conventions (THISCALL)
The C++ THISCALL calling convention is the standard calling convention for C++. In THISCALL, the function is called almost identically to the CDECL convention, but the this pointer (the pointer to the current class) must be passed.
The way that the this pointer is passed is compiler-dependent. Microsoft Visual C++ passes it in ECX. GCC passes it as if it were the first parameter of the function. (i.e. between the return address and the first formal parameter.)
[edit] Ada Calling Conventions
[edit] Pascal Calling Conventions
Th Pascal convention is essentially identical to cdecl, differing only in that:
- The parameters are pushed left to right (logical western-world reading order)
- The routine being called must clean the stack before returning
Additionally, each parameter on the 32-bit stack must use all four bytes of the DWORD, regardless of the actual size of the datum.
This is the main calling method used by Windows API routines, as it is slightly more efficient with regard to memory usage, stack access and calling speed.
Note: the Pascal convention is NOT the same as the Borland Pascal convention, which is a form of fastcall, using registers (eax, edx, ecx) to pass the first three parameters, and also known as Register Convention.
[edit] Fortran Calling Conventions
[edit] Inline Assembly
[edit] C/C++
[edit] Further Reading
For an in depth discussion as to how high-level programming constructs are translated into assembly language, see Reverse Engineering.