Floating Point/Floating Point Arithmetic

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Arithmetic[edit | edit source]

Floating point numbers would be useless if we couldnt operate on them. Fortunately, there are algorithms for performing the basic arithmetic operations (Addition, subtraction, multiplication, division), as well as other operations such as exponentials, square-roots, and trancendentals. This page will introduce some of the basic arithmetic operations, and more advanced algorithms will be saved for a later page.

Multiplication[edit | edit source]

Let's multiply the following two numbers:

Variable sign exponent fraction X 0 1001 010 Y 0 0111 110

Here are the steps again:

  1. First, convert the two representations to scientific notation. Thus, we explicitly represent the hidden 1.
  2. In this case, X is 1.01 X 22 and Y is 1.11 X 20.
  3. Let x be the exponent of X. Let y be the exponent of Y. The resulting exponent (call it z) is the sum of the two exponents. z may need to be adjusted after the next step.
  4. For now, the resulting exponent is 2 + 0 = 2
  5. Multiply the mantissa of X to the mantissa of Y. Call this result m.
  6. Multiplying 1.01 by 1.11 results in 10.0011
  7. If m is does not have a single 1 left of the radix point, then adjust the radix point so it does, and adjust the exponent z to compensate.
  8. Now, we have to renormalize 10.0011 to 1.00011 and increase the exponent by 1 to 3
  9. Add the sign bits, mod 2, to get the sign of the resulting multiplication.
  10. The sign bit is 0 + 0 = 0.
  11. Convert back to the one byte floating point representation, truncating bits if needed.
  12. We need to truncate 1.00011 x 23 to 1.000 x 23 and convert.
Product 	sign 	exponent 	fraction
X * Y 	        0 	  1010 	          000

Division[edit | edit source]

Addition[edit | edit source]

Subtraction[edit | edit source]