Floating Point/Floating Point Formats

Floating-Point Formats

There are 4 different formats of floating point number representation in the IEEE 754 standard:

Single-Precision
Double-Precision
Single, Extended-Precision
Double, Extended-Precision

Single-Precision

Single precision floating point numbers are 32 bits wide. The first bit (bit 31, the MSB) is a sign bit, the next 8 bits (bits 30-23) are the exponent, and the remaining 23 bits are for the significand. Note that even though 23 bits are stored for the significand, the precision(${\displaystyle p}$) is actually 24 bits. This is a trick made possible by a normalized floating point system with ${\displaystyle b=2}$. The exponent is biased by 127, so that negative exponents can be expressed.

Double-Precision

Double-precision numbers are 64 bits wide. The MSB (bit 63) is the sign bit. The next 11 bits (bits 62-52) are the exponent, and the rest of the bits (bits 51-0) are for the significand. Again, the precision is actually 53 bits (not 52) because of the same normalization trick.

Review

Format Width Precision Exponent Significand
Single 32 bits 23 bits bits 30-23 bits 22-0
Double 64 bits 52 bits bits 62-52 bits 51-0