X86 Assembly/16 32 and 64 Bits
When using x86 assembly, it is important to consider the differences between architectures that are 16, 32, and 64 bits. This page will talk about some of the basic differences between architectures with different bit widths.
The 8086 Registers
The 8086 registers are the following: AX, BX, CX, DX, BP, SP, DI, SI, CS, SS, ES, DS, IP and FLAGS. They are all 16 bits wide.
On any Windows-based system (except 64 bit versions), you can run a very handy program called "debug.exe" from a DOS shell, which is very useful for learning about 8086.
- AX, BX, CX, DX
- These general purpose registers can also be addressed as 8-bit registers. So AX = AH (high 8-bit) and AL (low 8-bit).
The problem was this: how can a 20-bit address space be referred to by the 16-bit registers? To solve this problem, they came up with segment registers CS (Code Segment), DS (Data Segment), ES (Extra Segment), and SS (Stack Segment). To convert from 20-bit address, one would first divide it by 16 and place the quotient in the segment register and remainder in the offset register. This was represented as CS:IP (this means, CS is the segment and IP is the offset). Likewise, when an address is written SS:SP it means SS is the segment and SP is the offset.
If CS = 0x258C and IP = 0x0012 (the "0x" prefix denotes hexadecimal notation), then CS:IP will point to a 20 bit address equivalent to "CS * 16 + IP" which will be = 0x258C * 0x10 + 0x0012 = 0x258C0 + 0x0012 = 0x258D2 (Remember: 16 decimal = 0x10). The 20-bit address is known as an absolute (or linear) address and the Segment:Offset representation (CS:IP) is known as a segmented address.
It is important to note that there is not a one-to-one mapping of physical addresses to segmented addresses; for any physical address, there is more than one possible segmented address. For example: consider the segmented representations B000:8000 and B200:6000. Evaluated, they both map to physical address B8000 (B000:8000 = B000x10+8000 = B0000+8000 = B8000 and B200:6000 = B200x10+6000 = B2000+6000 = B8000). However, using an appropriate mapping scheme avoids this problem: such a map applies a linear transformation to the physical addresses to create precisely one segmented address for each. To reverse the translation, the map [f(x)] is simply inverted.
For example, if the segment portion is equal to the physical address divided by 0x10 and the offset is equal to the remainder, only one segmented address will be generated. (No offset will be greater than 0x0f.) Physical address B8000 maps to (B8000/10):(B8000%10) or B800:0. This segmented representation is given a special name: such addresses are said to be "normalized Addresses".
CS:IP (Code Segment: Instruction Pointer) represents the 20 bit address of the physical memory from where the next instruction for execution will be picked up. Likewise, SS:SP (Stack Segment: Stack Pointer) points to a 20 bit absolute address which will be treated as stack top (8086 uses this for pushing/popping values)
With the chips beginning to support a 32-bit data bus, the registers needed to be updated to support the larger registers. The names for the 32-bit registers are simply the 16-bit names with an 'E' prepended.
- EAX, EBX, ECX, EDX
- These are the 32-bit versions of the registers shown above.
The names of the 64-bit registers are the same of those of the 16-bit registers, except beginning with an 'R'.
- RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
- These are the 64-bit versions of the registers shown above.
- This is the full instruction pointer and should be used instead of EIP (which will be inaccurate if the address space is larger than 4 GiB, which may happen even with 4 GiB or less of RAM).
- These are new extra registers for 64-bit. They are counted as if the registers above are registers zero through seven, inclusively, rather than one through eight.
64-bit x86 includes SSE2 (an extension to 32-bit x86), which provides 128-bit registers for specific instructions.
- SSE2 and newer.
- SSE3 and newer and AMD (but not Intel) SSE2.
Most CPUs made since 2008 also have AVX, a further extension that lengthens these registers to 256 bits.
The A20 Gate Saga
As was said earlier, the 8086 processor had 20 address lines (from A0 to A19), so the total memory addressable by it was 1 MB (or "2 to the power 20"). But since it had only 16 bit registers, they came up with segment:offset scheme or else using a single 16-bit register they couldn't have possibly accessed more than 64 KB (or 2 to the power 16) of memory. So this made it possible for a program to access the whole of 1 MB of memory.
But with this segmentation scheme also came a side effect. Not only could your code refer to the whole of 1 MB with this scheme, but actually a little more than that. Let's see how...
Let's keep in mind how we convert from a Segment:Offset representation to Linear 20 bit representation.
Segment:Offset = Segment x 16 + Offset
Now to see the maximum amount of memory that can be addressed, let's fill in both Segment and Offset to their maximum values and then convert that value to its 20-bit absolute physical address.
So, max value for Segment = FFFF and max value for Offset = FFFF
Now, let's convert FFFF:FFFF into its 20-bit linear address, bearing in mind 16 is represented as 10 in hexadecimal :-
So we get, FFFF:FFFF = FFFF x 10h + FFFF = FFFF0 + FFFF = FFFF0 + (FFF0 + F) = FFFFF + FFF0 = 1 MB + FFF0
- Note: FFFFF is hexadecimal and is equal to 1 MB (one megabyte) and FFF0 is equal to 64 KB minus 16 bytes.
Moral of the story: From Real mode a program can actually refer to (1 MB + 64 KB - 16) bytes of memory.
Notice the use of the word "refer" and not "access". Program can refer to this much memory but whether it can access it or not is dependent on the number of address lines actually present. So with the 8086 this was definitely not possible because when programs made references to 1 MB plus memory, the address that was put on the address lines was actually more than 20-bits, and this resulted in wrapping around of the addresses.
For example, if a code is referring to 1 MB, this will get wrapped around and point to location 0 in memory, likewise 1 MB + 1 will wrap around to address 1 (or 0000:0001).
Now there were some super funky programmers around that time who manipulated this feature in their code, that the addresses get wrapped around and made their code a little faster and a fewer bytes shorter. Using this technique it was possible for them to access 32 KB of top memory area (that is 32 KB touching 1 MB boundary) and 32 KB memory of the bottom memory area, without actually reloading their segment registers!
Simple maths you see, if in Segment:Offset representation you make Segment constant, then since Offset is a 16-bit value therefore you can roam around in a 64 KB (or 2 to the power 16) area of memory. Now if you make your segment register point to 32 KB below the 1 MB mark you can access 32 KB upwards to touch 1 MB boundary and then 32 KB further which will ultimately get wrapped to the bottom most 32 KB.
Now these super funky programmers overlooked the fact that processors with more address lines would be created. (Note: Bill Gates has been attributed with saying, "Who would need more than 640 KB memory?", these programmers were probably thinking similarly). In 1982, just 2 years after 8086, Intel released the 80286 processor with 24 address lines. Though it was theoretically backward compatible with legacy 8086 programs, since it also supported Real Mode, many 8086 programs did not function correctly because they depended on out-of-bounds addresses getting wrapped around to lower memory segments. So for the sake of compatibility IBM engineers routed the A20 address line (8086 had lines A0 - A19) through the Keyboard controller and provided a mechanism to enable/disable the A20 compatibility mode. Now if you are wondering why the keyboard controller, the answer is that it had an unused pin. Since the 80286 would have been marketed as having complete compatibility with the 8086 (that wasn't even yet out very long), upgraded customers would be furious if the 80286 was not bug-for-bug compatible such that code designed for the 8086 would operate just as well on the 80286, but faster.
32-bit addresses can cover memory up to 4 GB in size. This means that we don't need to use offset addresses in 32-bit processors. Instead, we use what is called the "Flat addressing" scheme, where the address in the register directly points to a physical memory location. The segment registers are used to define different segments, so that programs don't try to execute the stack section, and they don't try to perform stack operations on the data section accidentally.