Fundamentals of Data Representation: Floating point numbers

From Wikibooks, open books for an open world
Jump to navigation Jump to search

PAPER 2 - ⇑ Fundamentals of data representation ⇑

← Range and precision Floating point numbers Floating point normalisation →


If you study other subjects such as Physics or Chemistry, you may come across Floating Point numbers like this

  (Planck's constant)

The first bit defines the non-zero part of the number and is called the Mantissa, the second part defines how many positions we want to move the decimal point, this is known as the Exponent and can be positive when moving the decimal point to the right and negative when moving to the left.

If you wanted to write out that number in full you would have to move the decimal point in the exponent 34 places to the left, resulting in:


Which would take a lot of time to write and is very hard for the human eye to see how many zeros there are. Therefore, when we can accept a certain level of accuracy (6.63 = 3 significant figures), we can store a many digit number like Planck's constant in a small number of digits. You are always weighing up the scope (or range) of the number against its accuracy (number of significant bits).

The same is true with binary numbers and is even more important. When you are dealing with numbers and their computational representation you must always be aware of how much space the numbers will take up in memory. As we saw with the above example, the non floating point representation of a number can take up an unfeasible number of digits, imagine how many digits you would need to store in binary‽

A binary floating point number may consist of 2, 3 or 4 bytes, however the only ones you need to worry about are the 2 byte (16 bit) variety. The first 10 bits are the Mantissa, the last 6 bits are the exponent.

Just like the denary ('base 10', 'decimal') floating point representation, a binary floating point number will have a mantissa and an exponent, though as you are dealing with binary (base 2) you must remember that instead of having you will have to use .

Why use binary floating point numbers[edit | edit source]

Fixed point binary allows a computer to hold fractions but due to its nature is very limited in its scope. Even using 4 bytes to hold each number, with 8 bits for the fractional part after the point, the largest number that can be held is just over 8 million. Another format is needed for holding very large numbers.

In decimal, very large numbers can be shown with a mantissa and an exponent. i.e. 0.12*10² Here the 0.12 is the mantissa and the 10² is the exponent. the mantissa holds the main digits and the exponents defines where the decimal point should be placed.

The same technique can be used for binary numbers. For example, two bytes could be split so that 10 bits are used for the mantissa and the remaining 6 for the exponent. This allows a much greater scope of numbers to be used.

Converting binary floating point to decimal[edit | edit source]

There are several stages to take when working out a floating point number in binary. In fact it is much like a disco dance routine - known on this page as the Noorgat Dance, Kemp variation (you wont be tested on name but it should help you to remember)

  1. Sign - find the sign of the mantissa (make a note of this)
  2. Slide - find the value of the exponent and whether it is positive or negative
  3. Bounce - move the decimal the distance the exponent asks, left for a negative exponent, right for a positive
    1. If Moving Left and Is Positive Number, Then pad with zeroes
    2. If Moving Left and Is Negative Number, Then pad with ones
  4. Flip - If the mantissa is negative perform twos complement on it
  5. Swim - starting at the decimal point work out the values of the mantissa, going left, then right. Now make sure you refer back to the sign you recorded on the sign move.
Example: binary floating point worked example

Lets try it out. We are given the following 16 bit floating point number, with 10 bits for the mantissa, and 6 bits for the exponent. Remember the decimal point is between the first and second most significant bits



The first action we need to perform is the sign, find out the sign of the mantissa


It is 0 so the mantissa is positive

The second step in the Noorgat dance is the slide, we need to find the value of the exponent, that is the last 6 bits of the number



So we know that the exponent is of size positive one and we will have to move the decimal point 
one place to the right.

The third step in the Noorgat dance is the bounce that is moving the decimal point of the Mantissa the number of positions specified by the slide, which was one position to the right. Like so:



The fourth step is the optional flip. Check back to the sign stage and see if the Mantissa is negative. It isn't? Oh well you can skip past this stage then as we only flip the number if the mantissa is negative.

The fifth and final step is the swim. Taking the mantissa on its own we can now work out the value of the floating point number. Start at the centre and label each number to the left and so on. The each number on the right and so on.



Voila! the answer is 1
Exercise: Simple binary floating point

Work out the denary for the following, using 10 bits for the mantissa and 6 bits for the exponent:

0.001101000 000110

Answer:

1. Sign: the mantissa starts with a zero, therefore it is a positive number.
2. Slide: work out the value of the exponent

000110 = +6

3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 6 places to the right

0.001101000 -> 0001101.000

4. Flip: as the number isn't negative we don't need to do this
5. Swim: work out the value on the left hand side and right hand side of the decimal point

1+4+8 = +13 FINISHED!

0 101000000 111111

Answer:

1. Sign: the mantissa starts with a zero, therefore it is a positive number.
2. Slide: work out the value of the exponent

111111 It starts with a one therefore it is a negative number
000001 = -1

3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was negative so we need to move the decimal point 1 place to the left

0.101000000 -> 0.0101000000

4. Flip: as the mantissa number isn't negative we don't need to do this
5. Swim: work out the value on the left hand side and right hand side of the decimal point

1/4 + 1/16 = +0.3125 FINISHED!

1 011111010 000101

Answer:

1. Sign: the mantissa starts with a one, therefore it is a negative number.
2. Slide: work out the value of the exponent

000101 = +5

3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 5 places to the right

1.011111010 -> 101111.1010

4. Flip: the mantissa is negative as noted in step one so we need to convert this number

101111.1010 -> 010000.0110

5. Swim: work out the value on the left hand side and right hand side of the decimal point

16+1/4+1/8 = -16.375 FINISHED!

1 101000000 111101

Answer:

1. Sign: the mantissa starts with a one, therefore it is a negative number.
2. Slide: work out the value of the exponent

111101 It starts with a one therefore it is a negative number
000011 = -3

3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was negative so we need to move the decimal point 3 places to the left. Watch carefully!

1.101000000 -> 1.111101000000
note that we placed extra ones on the front of the number.
Consider the exponent being negative and the mantissa positive, we would add extra zeros on the front 0.01 * 2^-3 = 0.00001
If both are negative placing zeros in front of the mantissa would make it positive!
Therefore, we need to add extra ones to keep the mantissa negative
With the flip we'll lose these 'extra' ones

4. Flip: the mantissa is negative as noted in step one so we need to convert this number

1.111101000000 -> 0.000011000000

5. Swim: work out the value on the left hand side and right hand side of the decimal point

1/32+1/64 = -0.046875 Remember the number was negative! FINISHED!

1 111111010 000011

Answer:

1. Sign: the mantissa starts with a one, therefore it is a negative number.
2. Slide: work out the value of the exponent

000011 = +3

3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 3 places to the right.

1.111111010 -> 1111.111010

4. Flip: the mantissa is negative as noted in step one so we need to convert this number

1111.111010-> 0000.000110

5. Swim: work out the value on the left hand side and right hand side of the decimal point

1/16+1/32 = -0.09375 Remember the number was negative! FINISHED!

Converting denary(decimal) into binary floating point[edit | edit source]

You might also be asked to convert a denary(decimal) number into its binary floating point equivalent.

  1. work out the binary equivalent
  2. work out how far to move the binary point (y)
  3. set the exponent to be reverse of the number of places you moved the binary point (-y)
  4. pad the number with extra bits
Example: denary to binary floating point

If we are asked to convert the denary number 39.75 into binary floating point we first need to find out the binary equivalent:

128 64 32 16  8  4  2  1 . ½  ¼  ⅛
  0  0  1  0  0  1  1  1 . 1  1  0 

How far do we need to move the binary point to the left so that the number is normlised?

  0  0 . 1  0  0  1  1  1  1  1  0  (6 places to the left)

So to get our decimal point back to where it started, we need to move 6 places to the right. 6 now becomes your exponent.

0.100111110 | 000110

If you want to check your answer, convert the number above into decimal. You get 39.75!


Exercise: Simple binary floating point

Work out the binary floating point for the following, using 10 bits for the mantissa and 6 bits for the exponent:

67

Answer:

128 64 32 16  8  4  2  1 . ½  ¼  ⅛
  0  1  0  0  0  0  1  1 . 0  0  0 

How far do we need to move the binary point to the left so that the number is normlised?

  0 . 1  0  0  0  0  1  1  0  0  0  (7 places to the left)

To get the front to be normalised we must move the decimal point 7 places. (moving it 6 places would have made the number negative!)

0.100001100 | 000111

23.25

Answer:

128 64 32 16  8  4  2  1 . ½  ¼  ⅛
  0  0  0  1  0  1  1  1 . 0  1  0 

How far do we need to move the binary point to the left so that the number is normlised?

  0  0  0 . 1  0  1  1  1  0  1  0   (5 places to the left)

To get the front to be normalised we must move the decimal point 5 places. (moving it 4 places would have made the number negative!)

0.101110100 | 000101

123.875

Answer:

128 64 32 16  8  4  2  1 . ½  ¼  ⅛
  0  1  1  1  1  0  1  1 . 1  1  1 

How far do we need to move the binary point to the left so that the number is normlised?

  0 . 1  1  1  1  0  1  1  1  1  1   (7 places to the left)

To get the front to be normalised we must move the decimal point 7 places.

0.1111011111 | 000111

But this is using 11 bits for the mantissa, we have to drop one, losing accuracy!

0.111101111 | 000111

128.25

Answer:

128 64 32 16  8  4  2  1 . ½  ¼  ⅛
  1  0  0  0  0  0  0  0 . 0  1  0 

How far do we need to move the binary point to the left so that the number is normlised?

0.1  0  0  0  0  0  0  0  0  1  0   (8 places to the left)

To get the front to be normalised we must move the decimal point 8 places. (moving it 7 places would have made it negative!)

0.100000000 | 001000

Notice that we have had to drop the .25, as this would not have fitted into 10 bits for the mantissa.

-513

Answer:

1024 512 256 128 64 32 16  8  4  2  1 . ½  ¼  ⅛
   0   1   0   0  0  0  0  0  0  0  1 . 0  0  0 

Convert this into its negative form using the flipping rule:

1024 512 256 128 64 32 16  8  4  2  1 . ½  ¼  ⅛
   1   0   1   1  1  1  1  1  1  1  1 . 0  0  0 

How far do we need to move the binary point to the left so that the number is normlised?

   1 . 0  1  1  1  1  1  1  1  1  1  0  0  0   (10 places to the left)

To get the front to be normalised we must move the decimal point 10 places.

1.011111111 | 001010

Notice that we have had to drop the last one as this would not have fitted into 10 bits for the mantissa. This means that the number shown is only:

10111111110.0

converting this into denary:

01000000010.0 = -514

You'll look at errors using floating point numbers very soon

For when you have a 16bit number where the mantissa is 10bits and the exponent is 6 bits:

the largest positive number will be:

Mantissa: 0.111111111
Exponent: 011111

the smallest positive number will be:

Mantissa: 0.000000001
Exponent: 100000

the largest negative number will be:

Mantissa: 1.000000000
Exponent: 011111

the smallest negative number will be:

Mantissa: 1.111111111
Exponent: 100000