Fundamentals of data representation: Information coding systems

From Wikibooks, open books for an open world
Jump to navigation Jump to search

PAPER 2 - ⇑ Fundamentals of data representation ⇑

← Number bases Information coding systems Images →


Specification[edit]

Specification coverage
  • 3.5.5.1 - Character form of a decimal digit
  • 3.5.5.2 - ASCII and Unicode
  • 3.5.5.3 - Error checking and correction

ASCII and Unicode[edit]

Character code - a unique binary representation of a particular letter, number, or special characters.


A standard method for the representation of all the keyboard characters, including the numbers, and other commonly used functions in ASCII or the American Standard Code for Information Interchange. The most recent version is an 8-bit code allowing for 256 characters.

The limitations of ASCII:

  • 256 characters are not sufficient to represent all of the possible characters, numbers and symbols.
  • It was initially developed in English and therefore did not represent all of the other languages and scripts in the world.
  • Widespread use of web made it more important to have universal international coding system.
  • The range of platforms and program has increased dramatically with more developers from around the world using a much wider range of characters.

As a result, a new standard called Unicode has emerged which follows the same basic principles as ASCII in that in one of its forms it has a unique 8-bit code for every keyboard character on a standard English keyboard.

ASCII codes have been subsumed within Unicode meaning that the ASCII code for capital letter A is 65 and so is the Unicode code for the same character. Unicode also includes international characters or over 29 countries and even includes conversions of classical and ancient characters.

To represent these extra characters it is obviously necessary to use more than 8 bits per character and there are two common encodings of Unicode in use today (UTF-8 and UTF-16). As the name suggests the latter is a 16-bit code.

Error checking and correction[edit]

A parity bit is a method of detecting errors in data during transmission. When you send data, it is being sent as a series of 0s and 1s.



In the figure above, a Unicode character is transmitted as the binary code 0111000110101011. It is quite possible that this code could get corrupted as it passed around either inside the computer or across a network.





In the top example the parity bit is set to 0 to maintain an even number of ones. One method for detecting errors is to count the number of ones in each byte before the data is sent to see whether there is an even or odd number. At the receiving end, the code can be checked to see whether the number is still odd or even.



Majority voting[edit]

Majority voting is another method of identifying errors in transmitted data. In this case each bit is sent three times. So the binary code 1001 would be sent as:

111000000111.

When data is checked, you would expect to see patterns of three bits. In this case, it is 111 for the first bit, then 000 and so on. Where there is a discrepancy, you can use majority voting to see which bit occurs the most frequently. For example, if the same code 1001 was received as:

101010000111.

You can assume that the first bit should be 1 as two out of three of the three bits are 1 and that the second bit is 0 as two of the three bits are 0. The last two bits are 0 and 1 as there appears to be no errors in this part of the code.




Check digits[edit]

Like a parity bit, a check digit is a value that is added to the end of a number to try and ensure that the number is not corrupted in any way. The check digit is created by taking the digits that make up the number itself and processing them in some way to create a single digit. The simplest but most error-prone method is to add the digits of the number together, and keep on adding the digits until only a single digit remains.

So the digits of 123456 add up to 21 and 2 and 1 in turn add up to 3, so the number with its check digit becomes 1234563. When the data is being processed the check digit is recalculated and compared with the digit that has been transmitted. Where the check digit is the same then it is assumed that the data is corrected. Where there is a discrepancy, an error message is generated.

Exam tip - Check digits identify one or more errors have occurred but need more processing than parity and don't repair like majority vote.


Summary[edit]

  • Binary codes can be used to represent text, characters, numbers, graphics, video and audio.
  • ASCII and Unicode are systems for representing characters.
  • It is possible that the data can get corrupted at any point when it is being either processed or transmitted.
  • Error detection and correction methods include check digits and majority voting.


Exercises

Explain what is mean by a character code.

Answer:

A character code uses a unique number/code to represent each different character

The ASCII binary code for character 'a' is 11000012. How would the word "be" would be encoded in the binary form of ASCII.

Answer:

b = 1100010 e = 1100101

A program has been developed to convert a string so that all of its characters are in upper case.

The computer does this by taking each character’s ASCII binary code and applying a bitwise AND operation to it, using the mask 10111112.

Convert the lower case character 'c', ASCII code 11000112, into the upper case character 'C' using the method described above.

Answer:

1000011

Describe the differences between ASCII and Unicode.

Answer:

ASCII uses 7 or 8 bits per character and represents only Latin characters and extended symbols. Unicode uses 16 bits per character and can represent any character and language.

Describe the similarities between ASCII and Unicode.

Answer:

Unicode contains ASCII as a subset so every ASCII character can also be stored in Unicode. ASCII characters have the same character codes as they do in Unicode.

Compare the usefulness of parity checking, check digits and majority voting.

Answer:

Parity checks are quick and relatively cheap in terms of data transmission, but only detect single errors and cannot repair data; check digits require a lot of processing but detect any number of errors, they cannot repair data; majority vote can catch a lot of errors and requires little processing; it can repair errors but takes three times the amount of data for transmission.

The ASCII binary code for character a is 11000012

If the ASCII character has been received during a transmission, with the most significant (leftmost) bit being used as a parity bit and the odd parity system in use, explain whether or not the character has been received correctly and how you have determined this.

Answer:

The character has been received correctly as there are an odd amounts of 1s.

A system uses majority voting to send ASCII characters from one device to another. The receiver obtains the following for the transmission of one ASCII character

000 010 011 111 110 000 010 011

Determine the 8 bits that the receiver should use to represent the transmitted ASCII character.

Answer:

0 0 1 1 1 0 0 1.