x86 Assembly/Other Instructions
Stack Instructions
[edit | edit source]There are dedicated instructions for interacting with the stack.
Generic
[edit | edit source]push arg
This instruction decrements the stack pointer and stores the data specified as the argument into the location pointed to by the stack pointer.
pop arg
This instruction loads the data stored in the location pointed to by the stack pointer into the argument specified and then increments the stack pointer. For example:
mov eax, 5 mov ebx, 6 |
|
push eax |
The stack is now: [5] |
push ebx |
The stack is now: [6] [5] |
pop eax |
The topmost item (which is 6) is now stored in eax. The stack is now: [5] |
pop ebx |
ebx is now equal to 5. The stack is now empty. |
GPRs
[edit | edit source]pusha
This instruction pushes all the general purpose registers onto the stack in the following order: AX, CX, DX, BX, SP, BP, SI, DI. The value of SP pushed is the value before the instruction is executed. It is useful for saving state before an operation that could potentially change these registers.
popa
This instruction pops all the general purpose registers off the stack in the reverse order of PUSHA. That is, DI, SI, BP, SP, BX, DX, CX, AX. Used to restore state after a call to PUSHA.
pushad
This instruction works similarly to pusha, but pushes the 32-bit general purpose registers onto the stack instead of their 16-bit counterparts.
popad
This instruction works similarly to popa, but pops the 32-bit general purpose registers off of the stack instead of their 16-bit counterparts.
Flags
[edit | edit source]Because a good deal of all instructions somehow alter flags, the flags register is considered to be very volatile.
As a consequence in microprocessor architecture design, it cannot be queried, nor altered directly (except for a few individual flags, such as the DF).
Instead, a dedicated push
and pop
instruction (attempts to) retrieve or store a value to and from the stack.
Using them is “slow”, because, as there is only one flags register, all pending (potential) writes or reads must be executed first, before the actual value can be obtained or overwritten.
Furthermore, what can be read or overwritten depends on privileges.
pushf
This instruction decrements the stack pointer and then loads the location pointed to by the stack pointer with a masked copy of the flags register’s contents. The RF and VM flag are always cleared in the copy. Under certain conditions a GPF may arise.
popf
This instruction attempts, as far as possible, loading the flag register with the contents of the memory location pointed to by the stack pointer and then increments the contents of the stack pointer. Some flags may pertain their original values, even if requested to do so. If there is a lack of privileges to change certain or any values at all, a GPF occurs.
Outside OS development (like threading), a standard usage case of these instructions is to check, whether the cpuid
instruction is available.
If you can alter the ID flag in the EFLAGS register, the cpuid
instruction is supported.
example for a function checking for cpuid |
---|
Here, we assume we have the proper privileges to retrieve and overwrite the flags register.
In this example the programming language using this function requires pushfq ; put RFLAGS on top of stack
mov rax, [rsp] ; preserve copy for comparison
xor [rsp], $200000 ; flip bit in copy
popfq ; _attempt_ to overwrite RFLAGS
pushfq ; obtain possibly altered RFLAGS
pop rcx ; rcx ≔ rsp↑; inc(rsp, 8)
xor rax, rcx ; cancel out any _unchanged_ bits
shr eax, 20 ; move ID flag into bit position 0
|
Flags instructions
[edit | edit source]While the flags register is used to report on results of executed instructions (overflow, carry, etc.), it also contains flags that affect the operation of the processor. These flags are set and cleared with special instructions.
Interrupt Flag
[edit | edit source]The IF flag tells a processor if it should accept hardware interrupts. It should be kept set under normal execution. In fact, in protected mode, neither of these instructions can be executed by user-level programs.
sti
Sets the interrupt flag. If set, the processor can accept interrupts from peripheral hardware.
cli
Clears the interrupt flag. Hardware interrupts cannot interrupt execution. Programs can still generate interrupts, called software interrupts, and change the flow of execution. Non-maskable interrupts (NMI) cannot be blocked using this instruction.
Direction Flag
[edit | edit source]The DF flag tells the processor which way to read data when using string instructions. That is, whether to decrement or increment the esi
and edi
registers after a movs
instruction.
std
Sets the direction flag. Registers will decrement, reading backwards.
cld
Clears the direction flag. Registers will increment, reading forwards.
Carry Flag
[edit | edit source]The CF flag is often modified after arithmetic instructions, but it can be set or cleared manually as well.
stc
Sets the carry flag.
clc
Clears the carry flag.
cmc
Complements (inverts) the carry flag.
Other
[edit | edit source]sahf
Stores the content of AH register into the lower byte of the flag register.
lahf
Loads the AH register with the contents of the lower byte of the flag register.
I/O Instructions
[edit | edit source]in src, dest | GAS Syntax |
in dest, src | Intel Syntax |
The IN instruction almost always has the operands AX and DX (or EAX and EDX) associated with it. DX (src) frequently holds the port address to read, and AX (dest) receives the data from the port. In Protected Mode operating systems, the IN instruction is frequently locked, and normal users can't use it in their programs.
out src, dest | GAS Syntax |
out dest, src | Intel Syntax |
The OUT instruction is very similar to the IN instruction. OUT outputs data from a given register (src) to a given output port (dest). In protected mode, the OUT instruction is frequently locked so normal users can't use it.
No-op Instructions
[edit | edit source]The x86 instruction set has a NOP (no operation) instruction mnemonic:
nop
It has a single byte opcode, 0x90
. This instruction has no side effects other than incrementing the instruction pointer (EIP). Despite its name, a "do nothing" instruction is useful for execution speed optimizations. It is routinely used by optimizing compilers/assemblers, and can be seen scattered around in disassembled code, but is almost never used in manually written assembly code.
For illustration, some applications of nop
instructions are:
- aligning the following instruction to the start of a memory block;
- aligning series of jump targets;
- filling in space when binary patching an executable, e.g. removing a branch, instead of keeping dead code (code that is never executed).
Multi-byte no-op instructions |
---|
x86 extensions (including x86-64) from AMD[1] and Intel[2] include multi-byte no-op instructions. Actually, any valid instruction that doesn't have side effects can serve as a no-op. Some of the versions recommended by both referenced manuals are listed below. Size (bytes) Opcode (hexadecimal) Encoding --------------------------------------------------------------------------------- 1 90 NOP 2 66 90 66 NOP 3 0F 1F 00 NOP DWORD ptr [EAX] 4 0F 1F 40 00 NOP DWORD ptr [EAX + 00H] 5 0F 1F 44 00 00 NOP DWORD ptr [EAX + EAX*1 + 00H] 6 66 0F 1F 44 00 00 NOP DWORD ptr [AX + AX*1 + 00H] 7 0F 1F 80 00 00 00 00 NOP DWORD ptr [EAX + 00000000H] 8 0F 1F 84 00 00 00 00 00 NOP DWORD ptr [AX + AX*1 + 00000000H]
|
System Instructions
[edit | edit source]These instructions were added with the Pentium II.
sysenter
This instruction causes the processor to enter protected system mode (supervisor mode or "kernel mode").
sysexit
This instruction causes the processor to leave protected system mode, and enter user mode.
Misc Instructions
[edit | edit source]Read time stamp counter
[edit | edit source]RDTSC
RDTSC was introduced in the Pentium processor, the instruction reads the number of clock cycles since reset and returns the value in EDX:EAX. This can be used as a way of obtaining a low overhead, high resolution CPU timing. Although with modern CPU microarchitecture(multi-core, hyperthreading) and multi-CPU machines you are not guaranteed synchronized cycle counters between cores and CPUs. Also the CPU frequency may be variable due to power saving or dynamic overclocking. So the instruction may be less reliable than when it was first introduced and should be used with care when being used for performance measurements.
It is possible to use just the lower 32-bits of the result but it should be noted that on a 600 MHz processor the register would overflow every 7.16 seconds:
-
()
While using the full 64-bits allows for 974.9 years between overflows:
-
()
The following program (using NASM syntax) is an example of using RDTSC to measure the number of cycles a small block takes to execute:
global main
extern printf
section .data
align 4
a: dd 10.0
b: dd 5.0
c: dd 2.0
fmtStr: db "edx:eax = %llu edx = %d eax = %d", 0x0A, 0
section .bss
align 4
cycleLow: resd 1
cycleHigh: resd 1
result: resd 1
section .text
main: ; Using main since we are using gcc to link
;
; op dst, src
;
xor eax, eax
cpuid
rdtsc
mov [cycleLow], eax
mov [cycleHigh], edx
;
; Do some work before measurements
;
fld dword [a]
fld dword [c]
fmulp st1
fmulp st1
fld dword [b]
fld dword [b]
fmulp st1
faddp st1
fsqrt
fstp dword [result]
;
; Done work
;
cpuid
rdtsc
;
; break points so we can examine the values
; before we alter the data in edx:eax and
; before we print out the results.
;
break1:
sub eax, [cycleLow]
sbb edx, [cycleHigh]
break2:
push eax
push edx
push edx
push eax
push dword fmtStr
call printf
add esp, 20 ; Pop stack 5 times 4 bytes
;
; Call _exit(2) syscall
; noreturn void _exit(int status)
;
mov ebx, 0 ; Arg one: the 8-bit status
mov eax, 1 ; Syscall number:
int 0x80
In order to assemble, link and run the program we need to do the following:
$ nasm -felf -g rdtsc.asm -l rdtsc.lst
$ gcc -m32 -o rdtsc rdtsc.o
$ ./rdtsc
References
[edit | edit source]- ↑ a b "5.8 "Code Padding with Operand-Size Override and Multibyte NOP"". AMD Software Optimization Guide for AMD Family 15h Processors, document #47414. p. 94. http://support.amd.com/TechDocs/47414_15h_sw_opt_guide.pdf.
- ↑ "NOP". Intel 64 and IA-32 Architectures Software Developer's Manual. 2B: Instruction Set Reference. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf.