Quantity vs Numbers
An important distinction must be made between "Quantities" and "Numbers". A quantity is simply some amount of "stuff"; five apples, three pounds, and one automobile are all quantities of different things. A quantity can be represented by any number of different representations. For example, tick-marks on a piece of paper, beads on a string, or stones in a pocket can all represent some quantity of something. One of the most familiar representations are the base-10 (or "decimal") numbers, which consist of 10 digits, from 0 to 9. When more than 9 objects needs to be counted, we make a new column with a 1 in it (which represents a group of 10), and we continue counting from there.
Computers, however, cannot count in decimal. Computer hardware uses a system where values are represented internally as a series of voltage differences. For example, in most computers, a +5V charge is represented as a "1" digit, and a 0V value is represented as a "0" digit. There are no other digits possible! Thus, computers must use a numbering system that has only two digits(0 and 1): the "Binary", or "base-2", number system.
Understanding the binary number system is difficult for many students at first. It may help to start with a decimal number, since that is more familiar. It is possible to write a number like 1234 in "expanded notation," so that the value of each place is shown:
Notice that each digit is multiplied by successive powers of 10, since this a decimal, or base 10 system. The "ones" digit ("4" in the example) is multiplied by , or "1". Each digit to the left of the "ones" digit is multiplied by the next higher power of 10 and that is added to the preceding value.
Now, do the same with a binary number; but since this is a "base 2" number, replace powers of 10 with powers of 2:
The subscripts indicate the base. Note that in the above equations:
Binary numbers are the same as their equivalent decimal numbers, they are just a different way to represent a given quantity. To be very simplistic, it does not really matter if you have or apples, you can still make a pie.
The term Bits is short for the phrase Binary Digits. Each bit is a single binary value: 1 or zero. Computers generally represent a 1 as a positive voltage (5 volts or 3.3 volts are common values), and a zero as 0 volts.
Most Significant Bit and Least Significant Bit
In the decimal number 48723, the "4" digit represents the largest power of 10 (or ), and the 3 digit represents the smallest power of 10 (). Therefore, in this number, 4 is the most significant digit and 3 is the least significant digit. Consider a situation where a caterer needs to prepare 156 meals for a wedding. If the caterer makes an error in the least significant digit and accidentally makes 157 meals, it is not a big problem. However, if the caterer makes a mistake on the most significant digit, 1, and prepares 256 meals, that will be a big problem!
Now, consider a binary number: 101011. The Most Significant Bit (MSB) is the left-most bit, because it represents the greatest power of 2 (). The Least Significant Bit (LSB) is the right-most bit and represents the least power of 2 ().
Notice that MSB and LSB are not the same as the notion of "significant figures" that is used in other sciences. The decimal number 123000 has only 3 significant figures, but the most significant digit is 1 (the left-most digit), and the least significant digit is 0 (the right-most digit).
- a Nibble is 4 bits long. Nibbles can hold values from 0 to 15 (in decimal).
- a Byte is 8 bits long. Bytes can hold values from 0 to 255 (in decimal).
- a Word is 16 bits, or 2 bytes long. Words can hold values from 0 to 65535 (in Decimal). There is occasionally some confusion between this definition and that of a "machine word". See Machine Word below.
- a Double-word is 2 words long, or 4 bytes long. These are also known simply as "DWords". DWords are also 32 bits long. 32-bit computers therefore, manipulate data that is the size of DWords.
- a Quad-word is 2 DWords long, 4 words long, and 8 bytes long. They are known simply as "QWords". QWords are 64 bits long, and are therefore the default data size in 64-bit computers.
- Machine Word
- A machine word is the length of the standard data size of a given machine. For instance, a 32-bit computer has a 32-bit machine word. Likewise 64-bit computers have a 64-bit machine word. Occasionally the term "machine word" is shortened to simply "word", leaving some ambiguity as to whether we are talking about a regular "word" or a machine word.
It would seem logical that to create a negative number in binary, the reader would only need to prefix the number with a "–" sign. For instance, the binary number 1101 can become negative simply by writing it as "–1101". This seems all fine and dandy until you realize that computers and digital circuits do not understand minus sign. Digital circuits only have bits, and so bits must be used to distinguish between positive and negative numbers. With this in mind, there are a variety of schemes that are used to make binary numbers negative or positive: Sign and Magnitude, One's Complement, and Two's Complement.
Sign and Magnitude
Under a Sign and Magnitude scheme, the MSB of a given binary number is used as a "flag" to determine if the number is positive or negative. If the MSB = 0, the number is positive, and if the MSB = 1, the number is negative. This scheme seems awfully simple, except for one simple fact: arithmetic of numbers under this scheme is very hard. Let's say we have 2 nibbles: 1001 and 0111. Under sign and magnitude, we can translate them to read: -001 and +111. In decimal then, these are the numbers –1 and +7.
When we add them together, the sum of –1 + 7 = 6 should be the value that we get. However:
001 +111 ---- 000
And that isn't right. What we need is a decision-making construct to determine if the MSB is set or not, and if it is set, we subtract, and if it is not set, we add. This is a big pain, and therefore sign and magnitude is not used.
Let's now examine a scheme where we define a negative number as being the logical inverse of a positive number. We will use the same "!" operator to express a logical inversion on multiple bits. For instance, !001100 = 110011. 110011 is binary for 51, and 001100 is binary for 12. but in this case, we are saying that 001100 = –110011, or 110011(binary) = -12 decimal. let's perform the addition again:
001100 (12) +110011 (-12) ------- 111111
We can see that if we invert 0000002 we get the value 1111112. and therefore 1111112 is negative zero! What exactly is negative zero? it turns out that in this scheme, positive zero and negative zero are identical.
However, one's complement notation suffers because it has two representations for zero: all 0 bits, or all 1 bits. As well as being clumsy, this will also cause problems when we want to check quickly to see if a number is zero. This is an extremely common operation, and we want it to be easy, so we create a new representation, two's complement.
Two's complement is a number representation that is very similar to one's complement. We find the negative of a number X using the following formula:
-X = !X + 1
Let's do an example. If we have the binary number 11001 (which is 25 in decimal), and we want to find the representation for -25 in twos complement, we follow two steps:
- Invert the numbers:
- 11001 → 00110
- Add 1:
- 00110 + 1 = 00111
Therefore –11001 = 00111. Let's do a little addition:
11001 +00111 ------ 00000
Now, there is a carry from adding the two MSBs together, but this is digital logic, so we discard the carrys. It is important to remember that digital circuits have capacity for a certain number of bits, and any extra bits are discarded.
Most modern computers use two's complement.
Below is a diagram showing the representation held by these systems for all four-bit combinations:
Signed vs Unsigned
One important fact to remember is that computers are dumb. A computer doesnt know whether or not a given set of bits represents a signed number, or an unsigned number (or, for that matter, and number of other data objects). It is therefore important for the programmer (or the programmers trusty compiler) to keep track of this data for us. Consider the bit pattern 100110:
- Unsigned: 38 (decimal)
- Sign+Magnitude: -6
- One's Complement: -25
- Two's Complement: -26
See how the representation we use changes the value of the number! It is important to understand that bits are bits, and the computer doesn't know what the bits represent. It is up to the circuit designer and the programmer to keep track of what the numbers mean.
We've seen how binary numbers can represent unsigned values, and how they can represent negative numbers using various schemes. But now we have to ask ourselves, how do binary numbers represent other forms of data, like text characters? The answer is that there exist different schemes for converting binary data to characters. Each scheme acts like a map to convert a certain bit pattern into a certain character. There are 3 popular schemes: ASCII, UNICODE and EBCDIC.
The ASCII code (American Standard Code for Information Interchange) is the most common code for mapping bits to characters. ASCII uses only 7 bits, although since computers can only deal with 8-bit bytes at a time, ASCII characters have an unused 8th bit as the MSB. ASCII codes 0-31 are "Control codes" which are characters that are not printable to the screen, and are used by the computer to handle certain operations. code 32 is a single space (hit the space bar). The character code for the character '1' is 49, '2' is 50, etc... notice in ASCII '2' = '1' + 1 (the character 1 plus the integer number 1)). This is difficult for many people to grasp at first, so don't worry if you are confused.
Capital letters start with 'A' = 65 to 'Z' = 90. The lower-case letters start with 'a' = 97 to 'z' = 122.
Almost all the rest of the ASCII codes are different punctuation marks.
Since computers use data that is the size of bytes, it made no sense to have ASCII only contain 7 bits of data (which is a maximum of 128 character codes). Many companies therefore incorporated the extra bit into an "Extended ASCII" code set. These extended sets have a maximum of 256 characters to use. The first 128 characters are the original ASCII characters, but the next 128 characters are platform-defined. Each computer maker could define their own characters to fill in the last 128 slots.
When computers began to spread around the world, other languages began to be used by computers. Before too long, each country had its own character code sets, to represent their own letters. It is important to remember that some alphabets in the world have more than 256 characters! Therefore, the UNICODE standard was proposed. There are many different representations of UNICODE. Some of them use 2-byte characters, and others use different representations. The first 128 characters of the UNICODE set are the original ASCII characters.
For a more in-depth discussion of UNICODE, see this website.
EBCDIC (Extended Binary Coded Decimal Interchange format) is a character code that was originally proposed by IBM, but was passed in favor of ASCII. IBM however still uses EBCDIC in some of its super computers, mainframes, and server systems.
Octal is just like decimal and binary in that once one column is "full", you move onto the next. It uses the numbers 0−7 as digits, and because there a binary multiple (8=23) of digits available, it has a useful property that it is easy to convert between octal and binary numbers. Consider the binary number: 101110000. To convert this number to octal, we must first break it up into groups of 3 bits: 101, 110, 000. Then we simply add up the values of each bit:
And then we string all the octal digits together:
- 1011100002 = 5608.
Hexadecimal is a very common data representation. It is more common than octal, because it represents four binary digits per digit, and many digital circuits use multiples of four as their data widths.
Hexadecimal uses a base of 16. However, there is a difficulty in that it requires 16 digits, and the common decimal number system only has ten digits to play with (0 through 9). So, to have the necessary number of digits to play with, we use the letters A through F, in addition to the digits 0-9. After the unit column is full, we move onto the "16's" column, just as in binary and decimal.
Depending on the source code you are reading, hexadecimal may be notated in one of several ways:
- 0xaa11: ANSI C notation. The 0x prefix indicates that the remaining digits are to be interpreted as hexadeximal. For example, 0x1000, which is equal to 4096 in decimal.
- \xaa11: "C string" notation.
- 0aa11h: Typical assembly language notation, indicated by the h suffix. The leading 0 (zero) ensures the assembler does not mistakenly interpret the number as a label or symbol.
- $aa11: Another common assembly language notation, widely used in 6502/65816 assembly language programming.
- #AA11: BASIC notation.
- $aa11$: Business BASIC notation.
- aa1116: Mathematical notation, with the subscript indicating the number base.
- 16#AA11#: VHDL notation for a number.
- x"AA11": VHDL notation for a 16 bit array of bits.
- 16'hAA11: Verilog notation, where the "16" is the total length in bits.
Both uppercase and lowercase letters may be used. Lowercase is generally preferred in a Linux, UNIX or C environment, while uppercase is generally preferred in a mainframe or COBOL environment.