Unit 1.4.1 Data Types

From Wikibooks, open books for an open world
Jump to navigation Jump to search


Data Types[edit]

When we store data, we assign it a specific type. The type we use for a piece of data affects how we store and process it.

You need to have an understanding of the following types.

Type Description Examples
Integer A whole number (not a fraction and with no decimal point) 3 -12 17 0
Real Numbers which have decimal parts or are fractions 17.4 3.14 -4.8 0.05
Character A single letter, digit or other character 4 y J %
String A run of multiple characters aj%kD Matt bar 12three
Boolean One of two states (on or off) true false

Computer Arithmetic[edit]

Representing Numbers in Binary[edit]

Representing Positive Integers in Binary[edit]

Numbers can be represented in many bases, it so happens that humans use denary (or base 10) when working with numbers.

Let's take the number 184. We can break it into units, 10s and 100s columns like so:

100 10 1
1 8 4

What we place in the cell is the number of times we want the column added to get our number. Here, we want one 100, eight 10s and four 1s.

We can see that each column increases by a factor of 10 which is the base we are representing the number in.

Let's see how this works in binary (base 2) taking the number 14.

  • The highest power of 2 that fits into 14 is 8 (23), .
  • The highest power of 2 that fits into 6 is 4 (22), .
  • This leaves us with 2 which is one of our columns.

This gives us 14 in base 2.

14
8 4 2 1
1 1 1 0

In the example above we used 4 bits (i.e we had 4 columns which we could fill to represent the number). As with denary, larger numbers require more columns so in this next example we will use 8 bits.

Starting with the number 215.

Therefore 215 can be written as .

215
128 64 32 16 8 4 2 1
1 1 0 1 0 1 1 1

Representing Negative Integers in Binary[edit]

Representing negative numbers in binary is a more complicated process as we need a method of knowing whether a number is negative or not while only being able to use 1 and 0 without confusing them with the number itself. There are two main methods that are used to represent these numbers: Sign and Magnitude and Twos Complement

Sign and Magnitude[edit]

This is the simplest method to represent negative numbers in binary and just involves using the first bit of the number to represent whether the number is negative. If the number is negative then it is a one and if not it is a 0.

For example

Number Binary
67 01000011
-67 11000011

However, this also gives us some problems when doing arithmetic with the numbers as if we add 67 and -67, we should get 0. However as shown below this isn't the case.

0 1 0 0 0 0 1 1
+ 1 1 0 0 0 0 1 1
= 0 0 0 0 0 1 1 0
Carry 1 1 1 1

Which equals 6 with a lost bit due to overflow. This means that calculations cannot be done automatically with sign and magnitude and that numbers must be decoded before they are used.

Two's Complement[edit]

Two's complement is another method of representing negative numbers which aims to tackle this problem. The method for converting a number is a little more complex but it is much more powerful. If we want to represent the number -67, as above we abide by the following steps:

  1. Represent the number in regular binary with a number of empty bits before it
128 64 32 16 8 4 2 1
0 1 0 0 0 0 1 1
  1. NOT the number (convert all 1s to 0s and all 0s to 1s)
128 64 32 16 8 4 2 1
0 1 0 0 0 0 1 1
NOT 1 0 1 1 1 1 0 0
  1. Add one to the number
128 64 32 16 8 4 2 1
1 0 1 1 1 1 0 0
+ 0 0 0 0 0 0 0 1
= 1 0 1 1 1 1 0 1

And that's all there is to it. That number now represents -67 in two's complement.

One of the main advantages of using Two's Complement is the ability to do math without conversion. If we take the example from above with 67 added to -67:

128 64 32 16 8 4 2 1
0 1 0 0 0 0 1 1
+ 1 0 1 1 1 1 0 1
= 0 0 0 0 0 0 0 0
Carry 1 1 1 1 1 1 1 1

Which therefore equals 1 with a lost bit due to overflow. This also works for regular subtraction (add a positive number to a negative number to simulate subtraction). As a result this is used a lot more commonly and can be implemented more easily.

Representing Decimals in Binary[edit]

Representing decimals in binary is an even more complex than negative numbers as we need to somehow store where the decimal point is while still maintaining the number and the accuracy that we were given. The method that you need to know for this is the normalised floating point representation.

Normalised Floating Point[edit]

This method uses a standard format for all of the decimal numbers that are stored on a system. It uses a similar method to how we use standard form to represent a number (345.54 becomes ) by defining the decimal point to be between the first and second bit and storing how much the number has shifted. The number itself is called the mantissa and has a fixed number of bits and the amount shifted is called the exponent and similarly has a fixed size.

One thing that you need to be careful of when using this method is that you are very clear in how many bits are used to store each bit of the number. This becomes very important when dealing either with large numbers or numbers with lots of decimal digits as you will begin to run into errors where we loose accuracy (floating point errors).

To represent a decimal number normally, we use the columns that we have used previously, but to the right of the 1 column, we add a decimal point and continue the columns using the form .

128 64 32 16 8 4 2 1 .

From here we can just represent our number. For this example we will be representing the number 47.625. The integer portion can be represented normally as and the decimal portion can be represented as .

128 64 32 16 8 4 2 1 .
0 0 1 0 1 1 1 1 . 1 0 1

So now we have our number but we would never be able to store it on a device as we have a decimal point in the middle. This is where the idea of the floating point comes in. We need to find the position in the binary where the first change of digits happen. This means the first point where a 0 turns to a 1 or vice-versa. We then place the decimal point between these two values and record how many times we moved the decimal point.

128 64 . 32 16 8 4 2 1
0 0 . 1 0 1 1 1 1 1 0 1
.

We place it between the 64 and the 32 as this is the first change from a 0 to 1 that happens in the number, this is now in normalised form. Then we have moved it six places to the left. In this example we will be using a 10 bit mantissa and a 6 bit exponent (to be stored in a 16 bit register). When using the normalised form, we only take one digit from before the imaginary decimal point and then just work with 0s at the right end of the number. That means that the number we represented there becomes 01011111010 when using 10 bits. We then shifted it by 6 to the left which means a positive move of 6. This is 000110 in 6 bit binary.

Then we can just append the numbers to each other:

01011111010000110

This successfully represents the decimal number 47.625 in normalised floating point. This method can also be used in conjunction with twos complement in order to represent negative decimals and it can be used to represent very small decimals like 0.00005, however, you must remember that any shifts to the right are negative and shifts to the left are positive.

Adding Numbers in Binary[edit]

Just like for denary additions, we align the columns for the two numbers and add the values starting from the right.

  • 1 add 1 is 2 (so zero 1s and one 2s) therefore 1s column is 0 and we carry 1 to the 2s column.
  • 1 add 0 add 1 is 2 so set as 0 and carry again
  • 1 add 1 add 1 is 3 so set as 1 and carry
  • 0 add 0 add 1 is 1
  • 1 add 1 is 2 so set 0, carry 1
  • 0 add 0 add 1 is 1
  • 1 add 0 is 1
  • 1 add 0 is 1

The above calculation process can be seen much more clearly in the table:

215 + 21 = 236
128 64 32 16 8 4 2 1
1 1 0 1 0 1 1 1
+ 0 0 0 1 0 1 0 1
= 1 1 1 0 1 1 0 0
Carry 1 1 1 1

Check result in denary:

Representing Numbers in Hexadecimal[edit]

Hexadecimal is one other form of data representation that is used rather frequently within computing. It is used primarily as it can hold a much large number in a smaller number of digits than binary or denary (although it will always be stored as binary within the computers memory).

Hexadecimal is the name for Base 16 which means we use 16 characters to represent numbers. As we only have 10 numbers, it branches out to use the characters A, B, C, D, E and F which represent 10, 11, 12, 13, 14 and 15 respectively.

Denary to Hexadecimal[edit]

When representing denary numbers in base 16, we use the exact same method as we do for binary; however, instead of writing the columns as powers of 2, we must write them as powers of 16. For example, to represent 7652 in hexadecimal:

  • 4096 () goes into 7652 once meaning we put a 1 in that column, we now have 3556 left
  • 256 () goes into 3556 13 times meaning we place a D in that column, we now have 228 left
  • 16 () goes into 228 14 times meaning we place an E in that column, we now have 4 left
  • 1 () goes into 4 4 times meaning we place a 4 in that column and we have 0 left.
4096 256 16 1
1 D E 4

So therefore 7652 is 1DE4 in hexadecimal.

Binary to Hexadecimal[edit]

Converting from binary directly into hexadecimal is actually a simple process that does not require converting the whole thing back into denary at once. Instead, when converting this way, we simply convert the binary to denary then to hexadecimal one nibble (4 bits) at a time. If we have the number 156 in binary (10011100), we can place it into our table and write up the correct headers as shown:

8 4 2 1 8 4 2 1
1 0 0 1 1 1 0 0
= 9 = 12
= 9 = C

Therefore, this number is 9C in hexadecimal.

Bitwise Manipulations[edit]

Bitwise manipulations are specific group of changes that can be made to the bits of a binary number. There are a few that you need to be aware of as part of the specification which are: shifts, ANDs, ORs and XORs. This is a very quick and easy topic once you understand it.

Shifts[edit]

Binary registers can be shifted either to the left or the right and, as the name would suggest, it just moves the binary digits in one direction. Any spaces created will then just be filled with a 0 in the majority of cases.

Left Shift[edit]

A left shift moves the contents of the register to the left and discards any bits that no longer fit into the register. This function is important as it is equivalent to multiplying a number by two. In this example we will be using the value of 54:

Therefore if we left shift (often represented by the character <<) while maintaining a maximum size of 8 bits:

00110110 << 1 = 01101100 = 108

00110110 << 2 = 11011000 = 216

00110110 << 3 = 10110000 = 176

As you can see in the last example, when we shifted by 3, the first bit was 'pushed out' of the register and lost. This is an important consideration when using left shifts as we must be aware of when we are going to lose a potentially important bit. In the first two examples, you can see that each left shift was equal to multiplying the original denary number by a power of two. Left shifting by one multiplied by two and left shifting by two multiplied by four. If we were to use a bigger register then left shifting by 3 would have given us the original value multiplied by 8.

This function is often used when multiplying a number as it can be broken down into multiplying by a power of three and then adding the original number a number of times. For example, multiplying a number by five would be equal to left shifting by two (multiplying by 4) and then adding the original number again.

Right Shift[edit]

Right shifts, unsurprisingly, are the opposite of left shifting and instead move the contents of the registers to the right. This section will not go into as much detail as you should already understand the basics of how shifting works and how it can be used.

If we right shift (often represented by the character >>) while maintaining a maximum size of 8 bits:

00110110 >> 1 = 00011011 = 27

00110110 >> 2 = 00001101 = 13

00110110 >> 3 = 00000110 = 6

From the second example, the impact of underflow errors is immediately apparent.

AND[edit]

You may not have covered an AND gate fully as this shows up in unit 1.4.3, however you should be able to guess. An AND gate returns a 1 values if the two inputs are both 1.

AND can be used as a bitwise operator in order to AND two binary registers or binary values together. For example, if we have the two registers 01101000 and 01000010 we can do:

01101000 & 01000010 = 01000000

One application of this is to determine whether a number is divisible by 2. If we AND any value with the binary value 1, it will leave us with only the least significant bit (the last bit). If this number is then greater than 0, it is odd, if not it is even.

OR[edit]

OR gates return a 1 value if one of the two inputs is a 1. This can be used as a bitwise operator to OR two values together. To the use the example from above:

01101000 | 01000010 = 01101010

One application of this is to store a range of true or false values within a binary number such as for a very simple control program. If the user wanted to enable a system if it is not already enabled, they could OR the contents of the register storing the state with a binary number containing a 1 at the position corresponding to a setting. The resulting value would contain the original settings with the required bit set to 1 if it was not already.

XOR[edit]

XOR gates are a modified version of an OR gate which return a 1 if one of the two inputs is 1 but will return a 0 if they both are. To use the example from above

01101000 ^ 01000010 = 00101010

This can be applied again in a control system to toggle settings as if that bit is currently off in the register, it would be enabled, but if it was on it would be switched to off.

Representing Text[edit]

Representing is a relatively simple process which uses numbers to represent a symbol. While the process itself is simple, the implementations can actually be more complex. For multiple machines to be able to communicate text, they must have a standard which defines which numbers are used for which characters. These standards are called Character Sets and vary quite in how the characters are defined and what sizes are available.

There are a wide, wide range of character sets that are implemented in different places and systems can even use more than one as long as they are aware of which one is currently in use. You can see a list of common character sets here. However, you just need to be aware of the existence of two: ASCII and Unicode.

ASCII[edit]

Ascii Table

ASCII stands for the American Standard Code for Information Interchange. It was defined in 1963 and was one of the most common encodings. It uses 7 bits to represent characters by default which allows for a maximum of 128 characters to be represented. There are also extensions on this standard such as extended ASCII which uses 8 bit to represent characters which raises the possible options. A table showing all the ASCII codes along with their corresponding denary values is shown to the side. When text is encoded using this and stored, each of the denary numbers is represented as binary and stored.

Unicode[edit]

A very small subsection of the Unicode standard

Unicode functions in the same way as ASCII, but it varies in the number of bits it uses to store the characters. There are multiple subsets of Unicode which have varying amounts of characters such as UTF-8, UTF-16, and UTF-32. The most recent standard of Unicode (at the time of writing) has 128,237 possible characters. It has become the new standard for systems and is used when building new systems in order to accommodate a much wider range of characters and languages.

An image showing a fraction of the possible Unicode symbols is shown to the side. You can read more about Unicode on Wikipedia on the Unicode page.