x86 Assembly/NASM Syntax

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The Netwide Assembler is an x86 and x86-64 assembler that uses syntax similar to Intel. It supports a variety of object file formats, including:

  1. ELF32/64
  2. Linux a.out
  3. NetBSD/FreeBSD a.out
  4. MS-DOS 16-bit/32-bit object files
  5. Win32/64 object files
  6. COFF
  7. Mach-O 32/64
  8. rdf
  9. binary

NASM runs on both Unix/Linux and Windows/DOS.

NASM Syntax[edit | edit source]

The Netwide Assembler (NASM) uses a syntax "designed to be simple and easy to understand, similar to Intel's but less complex". This means that the operand order is dest then src, as opposed to the AT&T style used by the GNU Assembler. For example,

mov ax, 9

loads the number 9 into register ax.

For those using gdb with nasm, you can set gdb to use Intel-style disassembly by issuing the command:

set disassembly-flavor intel

Comments[edit | edit source]

A single semi-colon is used for comments, and functions the same as double slash in C++: the compiler ignores from the semicolon to the next newline.

Macros[edit | edit source]

NASM has powerful macro functions, similar to C's preprocessor. For example,

%define newline 0xA
%define func(a, b) ((a) * (b) + 2)

func (1, 22) ; expands to ((1) * (22) + 2)

%macro print 1 ; macro with one argument
  push dword %1 ; %1 means first argument
  call printf
  add  esp, 4
%endmacro

print mystring ; will call printf

Example I/O (Linux and BSD)[edit | edit source]

To pass the kernel a simple input command on Linux, you would pass values to the following registers and then send the kernel an interrupt signal. To read in a single character from standard input (such as from a user at their keyboard), do the following:

; read a byte from stdin
mov eax, 3		 ; 3 is recognized by the system as meaning "read"
mov ebx, 0		 ; read from standard input
mov ecx, variable        ; address to pass to
mov edx, 1		 ; input length (one byte)
int 0x80                 ; call the kernel

After the int 0x80, eax will contain the number of bytes read. If this number is < 0, there was a read error of some sort.

Outputting follows a similar convention:

; print a byte to stdout
mov eax, 4           ; the system interprets 4 as "write"
mov ebx, 1           ; standard output (print to terminal)
mov ecx, variable    ; pointer to the value being passed
mov edx, 1           ; length of output (in bytes)
int 0x80             ; call the kernel

BSD systems (MacOS X included) use similar system calls, but convention to execute them is different. While on Linux you pass system call arguments in different registers, on BSD systems they are pushed onto stack (except the system call number, which is put into eax, the same way as in Linux). BSD version of the code above:

; read a byte from stdin
mov eax, 3		; sys_read system call
push dword 1		; input length
push dword variable	; address to pass to
push dword 0		; read from standard input
push eax
int 0x80		; call the kernel
add esp, 16		; move back the stack pointer

; write a byte to stdout
mov eax, 4		; sys_write system call
push dword 1		; output length
push dword variable	; memory address
push dword 1		; write to standard output
push eax
int 0x80		; call the kernel
add esp, 16		; move back the stack pointer

; quit the program
mov eax, 1		; sys_exit system call
push dword 0		; program return value
push eax
int 0x80		; call the kernel

Hello World (Linux)[edit | edit source]

Below we have a simple Hello world example, it lays out the basic structure of a nasm program:

global _start

section .data
        ; Align to the nearest 2 byte boundary, must be a power of two
        align 2
        ; String, which is just a collection of bytes, 0xA is newline
        str:     db 'Hello, world!',0xA
        strLen:  equ $-str

section .bss

section .text
        _start:

;
;       op      dst,  src
;
                                ;
                                ; Call write(2) syscall:
                                ;       ssize_t write(int fd, const void *buf, size_t count)
                                ;
        mov     edx, strLen     ; Arg three: the length of the string
        mov     ecx, str        ; Arg two: the address of the string
        mov     ebx, 1          ; Arg one: file descriptor, in this case stdout
        mov     eax, 4          ; Syscall number, in this case the write(2) syscall: 
        int     0x80            ; Interrupt 0x80        

                                ;
                                ; Call exit(3) syscall
                                ;       void exit(int status)
                                ;
        mov     ebx, 0          ; Arg one: the status
        mov     eax, 1          ; Syscall number:
        int     0x80

In order to assemble, link and run the program we need to do the following:

$ nasm -f elf32 -g helloWorld.asm
$ ld -g helloWorld.o
$ ./a.out

Hello World (Using only Win32 system calls)[edit | edit source]

In this example we are going to rewrite the hello world example using Win32 system calls. There are several major differences:

  1. The intermediate file will be a Microsoft Win32 (i386) object file
  2. We will avoid using interrupts since they may not be portable and, this is Windows, not DOS, therefore we need to bring in several calls from kernel32 DLL


global _start

extern _GetStdHandle@4
extern _WriteConsoleA@20
extern _ExitProcess@4

section .data
        str:     db 'hello, world',0x0D,0x0A
        strLen:  equ $-str

section .bss
        numCharsWritten:        resd 1

section .text
        _start:

        ;
        ; HANDLE WINAPI GetStdHandle( _In_  DWORD nStdHandle ) ;
        ;
        push    dword -11       ; Arg1: request handle for standard output
        call    _GetStdHandle@4 ; Result: in eax

        ;
        ; BOOL WINAPI WriteConsole(
        ;       _In_        HANDLE hConsoleOutput,
        ;       _In_        const VOID *lpBuffer,
        ;       _In_        DWORD nNumberOfCharsToWrite,
        ;       _Out_       LPDWORD lpNumberOfCharsWritten,
        ;       _Reserved_  LPVOID lpReserved ) ;
        ;
        push    dword 0         ; Arg5: Unused so just use zero
        push    numCharsWritten ; Arg4: push pointer to numCharsWritten
        push    dword strLen    ; Arg3: push length of output string
        push    str             ; Arg2: push pointer to output string
        push    eax             ; Arg1: push handle returned from _GetStdHandle
        call    _WriteConsoleA@20


        ;
        ; VOID WINAPI ExitProcess( _In_  UINT uExitCode ) ;
        ;
        push    dword 0         ; Arg1: push exit code
        call    _ExitProcess@4

In order to assemble, link and run the program we need to do the following:

$ nasm -f win32 -g helloWorldWin32.asm
$ ld -e _start helloWorldwin32.obj -lkernel32 -o helloWorldWin32.exe

In this example we use the -e command line option when invoking ld to specify the entry point for program execution. Otherwise we would have to use _WinMain@16 as the entry point rather than _start. This example was run under cygwin, in a Windows command prompt the link step would be different. One last note, WriteConsole() does not behave well within a cygwin console, so in order to see output the final exe should be run within a Windows command prompt.

Hello World (Using C libraries and Linking with gcc)[edit | edit source]

In this example we will rewrite Hello World to use printf(3) from the C library and link using gcc. This has the advantage that going from Linux to Windows requires minimal source code changes and a slightly different assemble and link steps. In the Windows world this has the additional benefit that the linking step will be the same in the Windows command prompt and cygwin. There are several major changes:

  1. The "hello, world" string now becomes the format string for printf(3) and therefore needs to be null terminated. This also means we do not need to explicitly specify its length anymore.
  2. gcc expects the entry point for execution to be main
  3. Microsoft will prefix functions using the cdecl calling convention with a underscore. So main and printf will become _main and _printf respectively in the Windows development environment.


global main

extern printf

section .data
        fmtStr:  db 'hello, world',0xA,0

section .text
        main:

        sub     esp, 4          ; Allocate space on the stack for one 4 byte parameter

        lea     eax, [fmtStr]
        mov     [esp], eax      ; Arg1: pointer to format string
        call    printf         ; Call printf(3):
                                ;       int printf(const char *format, ...);

        add     esp, 4          ; Pop stack once

        ret

In order to assemble, link and run the program we need to do the following.

$ nasm -felf32 helloWorldgcc.asm
$ gcc helloWorldgcc.o -o helloWorldgcc

The Windows version with prefixed underscores:

global _main

extern _printf                ; Uncomment under Windows

section .data
        fmtStr:  db 'hello, world',0xA,0

section .text
        _main:

        sub     esp, 4          ; Allocate space on the stack for one 4 byte parameter

        lea     eax, [fmtStr]
        mov     [esp], eax      ; Arg1: pointer to format string
        call    _printf         ; Call printf(3):
                                ;       int printf(const char *format, ...);

        add     esp, 4          ; Pop stack once

        ret

In order to assemble, link and run the program we need to do the following.

$ nasm -fwin32 helloWorldgcc.asm
$ gcc helloWorldgcc.o -o helloWorldgcc.exe