X86 Assembly/Basic FAQ

From Wikibooks, open books for an open world
Jump to: navigation, search

This page is going to serve as a basic FAQ for people who are new to assembly language programming.

How Does the Computer Read/Understand Assembly?[edit]

The computer doesn't really "read" or "understand" anything per se, since a computer has no awareness nor consciousness, but that's beside the point. The fact is that the computer cannot read the assembly language that you write. Your assembler will convert the assembly language into a form of binary information called "machine code" that your computer uses to perform its operations. If you don't assemble the code, it's complete gibberish to the computer.

That said, assembly is important because each assembly instruction usually relates to just a single machine code, and it is possible for "mere mortals" to do this task directly with nothing but a blank sheet of paper, a pencil, and an assembly instruction reference book. Indeed, in the early days of computers this was a common task and even required in some instances "hand assembling" machine instructions for some basic computer programs. A classical example of this was done by Steve Wozniak, when he hand assembled the entire Integer BASIC interpreter into 6502 machine code for use on his initial Apple I computer. It should be noted, however, that such tasks done for commercially distributed software are so rare that they deserve special mention from that fact alone. Very, very few programmers have actually done this for more than a few instructions, and even then only for a classroom assignment.

Is it the Same On Windows/DOS/Linux?[edit]

The answers to this question are yes and no. The basic x86 machine code is dependent only on the processor. The x86 versions of Windows and Linux are obviously built on the x86 machine code. There are a few differences between Linux and Windows programming in x86 Assembly:

  1. On a Linux computer, the most popular assemblers are the GAS assembler, which uses the AT&T syntax for writing code, and the Netwide Assembler, also known as NASM, which uses a syntax similar to MASM.
  2. On a Windows computer, the most popular assembler is MASM, which uses the Intel syntax.
  3. The available software interrupts, and their functions, are different on Windows and Linux.
  4. The available code libraries are different on Windows and Linux.

Using the same assembler, the basic assembly code written on each Operating System is basically the same, except you interact with Windows differently than you interact with Linux.

Which Assembler is Best?[edit]

The short answer is that none of the assemblers is better than any other; it's a matter of personal preference.

The long answer is that different assemblers have different capabilities, drawbacks, etc. If you only know GAS syntax, then you will probably want to use GAS. If you know Intel syntax and are working on a Windows machine, you might want to use MASM. If you don't like some of the quirks or complexities of MASM and GAS, you might want to try FASM or NASM. We will cover the differences between the different assemblers in section 2.

Do I Need to Know Assembly?[edit]

You don't need to know assembly for most computer tasks, but it can definitely be useful. Learning assembly is not about learning a new programming language. If you are going to start a new programming project (unless that project is a bootloader, a device driver, or a kernel), then you will probably want to avoid assembly like the plague. An exception to this could be if you absolutely need to squeeze the last bits of performance out of a congested inner loop and your compiler is producing suboptimal code. Keep in mind, though, that premature optimization is the root of all evil, although some computing-intense realtime tasks can only be optimized sufficiently if optimization techniques are understood and planned for from the start.

However, learning assembly gives you a particular insight into how your computer works on the inside. When you program in a higher-level language like C or Ada, all your code will eventually need to be converted into machine code instructions so your computer can execute them. Understanding the limits of exactly what the processor can do, at the most basic level, will also help when programming a higher-level language.

How Should I Format my Code?[edit]

Most assemblers require that assembly code instructions each appear on their own line and are separated by a carriage return. Most assemblers also allow for whitespace to appear between instructions, operands, etc. Exactly how you format code is up to you, although there are some common ways:

One way keeps everything lined up:

Label1:
mov ax, bx
add ax, bx
jmp Label3
Label2:
mov ax, cx
...

Another way keeps all the labels in one column and all the instructions in another column:

Label1: mov ax, bx
        add ax, bx
        jmp Label3
Label2: mov ax, cx
...

Another way puts labels on their own lines and indents instructions slightly:

Label1:
   mov ax, bx
   add ax, bx
   jmp Label3
Label2:
   mov ax, cx
...

Yet another way separates labels and instructions into separate columns AND keeps labels on their own lines:

Label1:
        mov ax, bx
        add ax, bx
        jmp Label3
Label2:
        mov ax, cx
...

So there are a million different ways to do it, but there are some general rules that assembly programmers generally follow:

  1. Make your labels obvious, so other programmers can see where they are.
  2. More structure (indents) will make your code easier to read.
  3. Use comments to explain what you are doing. The meaning of a piece of assembly code can often not be immediately clear.