x86 Assembly/Floating Point

x86 Assembly
quick links: registers • move • jump • calculate • logic • rearrange • misc. • FPU

The ALU is only capable of dealing with integer values. While integers are sufficient for some applications, it is often necessary to use decimals. A highly specialized coprocessor, all part of the FPU – the floating-point unit –, will allow you to manipulate numbers with fractional parts.

x87 Coprocessor

The original x86 family members had a separate math coprocessor that handled floating point arithmetic. The original coprocessor was the 8087, and all FPUs since have been dubbed “x87” chips. Later variants integrated the FPU into the microprocessor itself. Having the capability to manage floating point numbers means a few things:

The microprocessor must have space to store floating point numbers.
The microprocessor must have instructions to manipulate floating point numbers.

The FPU, even when it is integrated into an x86 chip, is still called the “x87” section. For instance, literature on the subject will frequently call the FPU Register Stack the “x87 Stack”, and the FPU operations will frequently be called the “x87 instruction set”.

The presence of an integrated x87 FPU can be checked using the cpuid instruction.

; after you have verified
; that the cpuid instruction is indeed available:
mov eax, 1     ; argument request feature report
cpuid
xor rax, rax   ; wipe clean accumulator register
bt edx, rax    ; CF ≔ edx[rax]    retrieve bit 0
setc al        ; al ≔ CF

FPU Register Stack

The FPU has an array of eight registers that can be accessed as a stack. There is one top index indicating the current top of the stack. Pushing or popping items to or from the stack will only change the top index and store or wipe data respectively.

st(0) or simply st refers to the register that is currently at the top of the stack. If eight values were stored on the stack, st(7) refers to last element on the stack (i. e. the bottom).

Numbers are pushed onto the stack from memory, and are popped off the stack back to memory. There is no instruction allowing to transfer values directly to or from ALU registers. The x87 stack can only be accessed by FPU instructions ‒ you cannot write mov eax, st(0) ‒ it is necessary to store values to memory if you want to print them, for example.

FPU instructions generally will pop the first two items off the stack, act on them, and push the answer back on to the top of the stack.

Floating point numbers may generally be either 32 bits long, the float data type in the programming language C, or 64 bits long, double in C. However, in order to reduce round-off errors, the FPU stack registers are all 80 bits wide.

Most calling conventions return floating point values in the st(0) register.

Examples

The following program (using NASM syntax) calculates the square root of 123.45.

[org 0x7c00]
[bits 16]

global _start

section .data
	val: dq 123.45   ; define quadword (double precision)

section .bss
	res: resq 1      ; reserve 1 quadword for result

section .text


_start:
    ;initilizes the FPU, avoids inconsistent behavior
    fninit
	; load value into st(0)
	fld qword [val]  ; treat val as an address to a qword
	; compute square root of st(0) and store the result in st(0)
	fsqrt
	; store st(0) at res, and pop it off the x87 stack
	fstp qword [res]
	; the FPU stack is now empty again

	; end of program

Essentially, programs that use the FPU load values onto the stack with fld and its variants, perform operations on these values, then store them into memory with one of the forms of fst, most commonly fstp when you are done with x87, to clean up the x87 stack as required by most calling conventions.

Here is a more complex example that evaluates the Law of Cosines:

;; c^2 = a^2 + b^2 - cos(C)*2*a*b
;; C is stored in ang

global _start

section .data
    a: dq 4.56   ;length of side a
    b: dq 7.89   ;length of side b
    ang: dq 1.5  ;opposite angle to side c (around 85.94 degrees)

section .bss
    c: resq 1    ;the result ‒ length of side c

section .text
    _start:

    fld    qword [a]   ;load a into st0
    fmul   st0, st0    ;st0 = a * a = a^2

    fld    qword [b]   ;load b into st0   (pushing the a^2 result up to st1)
    fmul   st0, st0    ;st0 = b * b = b^2,   st1 = a^2

    faddp              ;add and pop, leaving st0 = old_st0 + old_st1 = a^2 + b^2.  (st1 is freed / empty now)

    fld    qword [ang] ;load angle into st0.  (st1 = a^2 + b^2 which we'll leave alone until later)
    fcos               ;st0 = cos(ang)

    fmul   qword [a]   ;st0 = cos(ang) * a
    fmul   qword [b]   ;st0 = cos(ang) * a * b
    fadd   st0, st0    ;st0 = cos(ang) * a * b + cos(ang) * a * b = 2(cos(ang) * a * b)

    fsubp  st1, st0    ;st1 = st1 - st0 = (a^2 + b^2) - (2 * a * b * cos(ang))
                       ;and pop st0

    fsqrt              ;take square root of st0 = c

    fstp   qword [c]   ;store st0 in c and pop, leaving the x87 stack empty again ‒ and we're done!

    ; don't forget to make an exit system call for your OS,
    ; or execution will fall off the end and decode whatever garbage bytes are next.
    mov   eax, 1                ; __NR_exit
    xor   ebx, ebx
    int   0x80                  ; i386 Linux sys_exit(0)
    ;end program

Floating-Point Instruction Set

You may notice that some of the instructions below differ from another in name by just one letter: a P appended to the end. This suffix signifies that in addition to performing the normal operation, they also Pop the x87 stack after execution is complete.

Original 8087 instructions

FDISI, FENI, FLDENVW, FLDPI, FNCLEX, FNDISI, FNENI, FNINIT, FNSAVEW, FNSTENVW, FRSTORW, FSAVEW, FSTENVW

Data Transfer Instructions

fld: load floating-point value
fild: load integer
fbld
fbstp
load a constant on top of the stack
- fld1: $+1$
- fldld2e: $\log _{2}e$
- fldl2t: $\log _{2}10$
- fldlg2: $\log _{10}2$
- flln2: $\ln 2$
- fldz: “positive” $0$

fst, fstp
fist, fistp: store integer
fxch: exchange
fisttp: store a truncated integer

Arithmetic Instructions

fabs: absolute value
fchs: change sign
fxtract: split exponent and significant

fadd, faddp, fiadd: addition
fsub, fsubp, fisub: subtraction
fsubr, fsubrp, fisubr: reverse subtraction

fmul, fmulp, fimul
fsqrt: square root
fdiv, fdivp, fidiv: division (see also fdiv bug on Wikipedia)
fdivr, fdivrp, fidivr
fprem: partial remainder
fptan
fpatan
frndint: round to integer
fscale: multiply/divide by integral powers of 2
f2xm1: $2^{x}-1$
fyl2x: $y\log _{2}x$
fyl2xp1: $y\log _{2}\left(x+1\right)$

FPU Internal and Other Instructions

finit: initialize FPU
fldcw
flenv
frstor
fsave, fnsave
fstcw, fnstcw
fstenv, fnstenv
fstsw, fnstsw

finccstp and fdecstp: increment or decrement top
ffree: tag a register as free

ftst: test
fcom, fcomp, fcompp: compare floating-point values
ficom, ficomp: compare with an integer
fxam: examine a register

fclex: clear exceptions
fnop
fwait does the same as wait.

Added in specific processors

Added with 80287

FSETPM

Added with 80387

FCOS, FLDENVD, FNSAVED, FNSTENVD, FPREM1, FRSTORD, FSAVED, FSIN, FSINCOS, FSTENVD, FUCOM, FUCOMP, FUCOMPP

Added with Pentium Pro

FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE, FCMOVNU, FCMOVU, FCOMI, FCOMIP, FUCOMI, FUCOMIP, FXRSTOR, FXSAVE

Added with SSE

FXRSTOR, FXSAVE

These are also supported on later Pentium IIs which do not contain SSE support

Added with SSE3

FISTTP (x87 to integer conversion with truncation regardless of status word)

Undocumented instructions

ffreep: performs ffree st(i) and pops the stack