Floating Point/Floating Point Formats

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Floating-Point Formats[edit | edit source]

There are 4 different formats of floating point number representation in the IEEE 754 standard:

Single-Precision
Double-Precision
Single, Extended-Precision
Double, Extended-Precision

Single-Precision[edit | edit source]

Single precision floating point numbers are 32 bits wide. The first bit (bit 31, the MSB) is a sign bit, the next 8 bits (bits 30-23) are the exponent, and the remaining 23 bits are for the significand. Note that even though 23 bits are stored for the significand, the precision() is actually 24 bits. This is a trick made possible by a normalized floating point system with . The exponent is biased by 127, so that negative exponents can be expressed.

Double-Precision[edit | edit source]

Double-precision numbers are 64 bits wide. The MSB (bit 63) is the sign bit. The next 11 bits (bits 62-52) are the exponent, and the rest of the bits (bits 51-0) are for the significand. Again, the precision is actually 53 bits (not 52) because of the same normalization trick.

Extended-Precision[edit | edit source]

Review[edit | edit source]

Format Width Precision Exponent Significand
Single 32 bits 23 bits bits 30-23 bits 22-0
Double 64 bits 52 bits bits 62-52 bits 51-0