Floating Point/Scientific Notation

From Wikibooks, open books for an open world
Jump to: navigation, search

Scientific Notation[edit]

Scientific notation, as many people may remember, is a method of writing out large or small numbers, as a normalized fraction, and a multiplier. The term normalized, in this case, means that the magnitude (absolute value) of the number is between 1 and 10. If the number we have lies outside this range, we multiply or divide by successive powers of 10, as necessary, until the fractional part of the number is within that range.

This book assumes the reader has a certain amount of prior knowledge in scientific notation, and includes this page as simply a refresher.


Let's say that we have a large number: 123,456,789 in scientific notation, we want to divide this by a power of 10, so that the result lies between -10 and 10 (non-inclusive). To do this, we divide by 100,000,000 (hundred million), and we get the final result:

Now, to express our original number, we have to multiply this fractional number times the amount we divided by originally:

And for ease, we frequently write the last term as an exponent of 10:


In a binary number system, the idea of scientific notation is similar, but uses powers of two, instead of powers of 10. Let's say that we have a binary number 1001011 (75, decimal). We divide this by 1000000 (64, decimal), and get our result: 1.001011. Now, what does it mean when we have binary numbers after the decimal point?

Since the point is called a "decimal point" in a decimal (base 10) number system, it is common to refer to the same exact point as a "binary point" when in a binary system. The terminology is not important however, because a "decimal point" and a "binary point" look exactly the same, and do the same thing. These terms will be used interchangeably in this book.

To the left of the decimal point are increasing powers of two. It would only make sense then that to the right of the decimal point are decreasing powers of two. Here is a quick example:

And our normalized number:

Now, we can write the final binary product as: