X86 Assembly/NASM Syntax
The Netwide Assembler is an x86 and x86-64 assembler that uses syntax similar to Intel. It supports a variety of object file formats, including:
- Linux a.out
- NetBSD/FreeBSD a.out
- MS-DOS 16-bit/32-bit object files
- Win32/64 object files
- Mach-O 32/64
NASM runs on both Unix and Windows/DOS.
The Netwide Assembler (NASM) uses a syntax "designed to be simple and easy to understand, similar to Intel's but less complex". This means that the operand order is dest then src, as opposed to the AT&T style used by the GNU Assembler. For example,
mov ax, 9
loads the number 9 into register ax.
For those using gdb with nasm, you can set gdb to use Intel-style disassembly by issuing the command:
set disassembly-flavor intel
A single semi-colon is used for comments, and functions the same as double slash in C++: the compiler ignores from the semicolon to the next newline.
NASM has powerful macro functions, similar to C's preprocessor. For example,
%define newline 0xA %define func(a, b) ((a) * (b) + 2) func (1, 22) ; expands to ((1) * (22) + 2) %defmacro print 1 ; macro with one argument push dword %1 ; %1 means first argument call printf add esp, 4 %endmacro print mystring ; will call printf
Example I/O (Linux and BSD)
To pass the kernel a simple input command on Linux, you would pass values to the following registers and then send the kernel an interrupt signal. To read in a single character from standard input (such as from a user at their keyboard), do the following:
; read a byte from stdin mov eax, 3 ; 3 is recognized by the system as meaning "read" mov ebx, 0 ; read from standard input mov ecx, variable ; address to pass to mov edx, 1 ; input length (one byte) int 0x80 ; call the kernel
eax will contain the number of bytes read. If this number is < 0, there was a read error of some sort.
Outputting follows a similar convention:
; print a byte to stdout mov eax, 4 ; the system interprets 4 as "write" mov ebx, 1 ; standard output (print to terminal) mov ecx, variable ; pointer to the value being passed mov edx, 1 ; length of output (in bytes) int 0x80 ; call the kernel
BSD systems (MacOS X included) use similar system calls, but convention to execute them is different. While on Linux you pass system call arguments in different registers, on BSD systems they are pushed onto stack (except the system call number, which is put into eax, the same way as in Linux). BSD version of the code above:
; read a byte from stdin mov eax, 3 ; sys_read system call push dword 1 ; input length push dword variable ; address to pass to push dword 0 ; read from standard input push eax int 0x80 ; call the kernel add esp, 16 ; move back the stack pointer ; write a byte to stdout mov eax, 4 ; sys_write system call push dword 1 ; output length push dword variable ; memory address push dword 1 ; write to standard output push eax int 0x80 ; call the kernel add esp, 16 ; move back the stack pointer ; quit the program mov eax, 1 ; sys_exit system call push dword 0 ; program return value push eax int 0x80 ; call the kernel
Hello World (Linux)
Below we have a simple Hello world example, it lays out the basic structure of a nasm program:
global _start section .data ; Align to the neareast 2 byte boundry, must be a power of two align 2 ; String, which is just a collection of bytes, 0xA is newline str: db 'Hello, world!',0xA strLen: equ $-str section .bss section .text _start: ; ; op dst, src ; ; ; Call write(2) syscall: ; ssize_t write(int fd, const void *buf, size_t count) ; mov edx, strLen ; Arg three: the length of the string mov ecx, str ; Arg two: the address of the string mov ebx, 1 ; Arg one: file descriptor, in this case stdout mov eax, 4 ; Syscall number, in this case the write(2) syscall: int 0x80 ; Interrupt 0x80 ; ; Call exit(3) syscall ; void exit(int status) ; mov ebx, 0 ; Arg one: the status mov eax, 1 ; Syscall number: int 0x80
In order to assemble, link and run the program we need to do the following:
$ nasm -felf32 -g helloWorld.asm $ ld -g helloWorld.o $ ./a.out
Hello World (Using only Win32 system calls)
In this example we are going to rewrite the hello world example using Win32 system calls. There are several major differences:
- The intermediate file will be a Microsoft Win32 (i386) object file
- We will avoid using interrupts since they may not be portable and therefore we need to bring in several calls from kernel32 DLL
global _start extern _GetStdHandle@4 extern _WriteConsoleA@20 extern _ExitProcess@4 section .data str: db 'hello, world',0xA strLen: equ $-str section .bss numCharsWritten: resb 1 section .text _start: ; ; HANDLE WINAPI GetStdHandle( _In_ DWORD nStdHandle ) ; ; push dword -11 ; Arg1: request handle for standard output call _GetStdHandle@4 ; Result: in eax ; ; BOOL WINAPI WriteConsole( ; _In_ HANDLE hConsoleOutput, ; _In_ const VOID *lpBuffer, ; _In_ DWORD nNumberOfCharsToWrite, ; _Out_ LPDWORD lpNumberOfCharsWritten, ; _Reserved_ LPVOID lpReserved ) ; ; push dword 0 ; Arg5: Unused so just use zero push numCharsWritten ; Arg4: push pointer to numCharsWritten push dword strLen ; Arg3: push length of output string push str ; Arg2: push pointer to output string push eax ; Arg1: push handle returned from _GetStdHandle call _WriteConsoleA@20 ; ; VOID WINAPI ExitProcess( _In_ UINT uExitCode ) ; ; push dword 0 ; Arg1: push exit code call _ExitProcess@4
In order to assemble, link and run the program we need to do the following. This example was run under cygwin, in a Windows command prompt the link step would be different. In this example we use the
-e command line option when invoking
ld to specify the entry point for program execution. Otherwise we would have to use
_WinMain@16 as the entry point rather than
_start. One last note,
WriteConsole() does not behave well within a cygwin console, so in order to see output the final exe should be run within a Windows command prompt:
$ nasm -f win32 -g helloWorldWin32.asm $ ld -e _start helloWorldwin32.obj -lkernel32 -o helloWorldWin32.exe
Hello World (Using C libraries and Linking with gcc)
In this example we will rewrite Hello World to use
printf(3) from the C library and link using
gcc. This has the advantage that going from Linux to Windows requires minimal source code changes and a slightly different assemble and link steps. In the Windows world this has the additional benefit that the linking step will be the same in the Windows command prompt and cygwin. There are several major changes:
"hello, world"string now becomes the format string for
printf(3)and therefore needs to be null terminated. This also means we do not need to explicitly specify it's length anymore.
- gcc expects the entry point for execution to be main
- Microsoft will prefix functions using the
cdeclcalling convention with a underscore. So
_printfrespectively in the Windows development environment.
global main extern printf section .data fmtStr: db 'hello, world',0xA,0 section .text main: sub esp, 4 ; Allocate space on the stack for one 4 byte parameter lea eax, [fmtStr] mov [esp], eax ; Arg1: pointer to format string call printf ; Call printf(3): ; int printf(const char *format, ...); add esp, 4 ; Pop stack once ret
In order to assemble, link and run the program we need to do the following.
$ nasm -felf32 helloWorldgcc.asm $ gcc helloWorldgcc.o -o helloWorldgcc
The Windows version with prefixed underscores:
global _main extern _printf ; Uncomment under Windows section .data fmtStr: db 'hello, world',0xA,0 section .text _main: sub esp, 4 ; Allocate space on the stack for one 4 byte parameter lea eax, [fmtStr] mov [esp], eax ; Arg1: pointer to format string call _printf ; Call printf(3): ; int printf(const char *format, ...); add esp, 4 ; Pop stack once ret
In order to assemble, link and run the program we need to do the following.
$ nasm -fwin32 helloWorldgcc.asm $ gcc helloWorldgcc.o -o helloWorldgcc