GCSE Computer Science/Data storage

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Formats[edit]

Specification link

Show understanding that sound (music), pictures, video, text and numbers are stored in different formats - 2016 CIE Syllabus p10

Sound (music), pictures, video, text and numbers are stored in a variety of different formats depending on what is being stored.

Error Detection[edit]

Specification link

Identify and describe methods of error detection and correction, such as parity checks, check digits, checksums and Automatic Repeat reQuests (ARQ) - 2016 CIE Syllabus p10

Parity checking

Parity checking is one method used to check whether data has been changed or corrupted following transmission from one device or medium to another device or medium. The parity of a number is either odd or even, and in a parity check extra 'bits' are added so that the number of 1s are either always odd or always even.

One way, it works is by allocating one bit out of each byte of data to be a parity bit (also known as check bit) before transmission occurs. Even or odd parity is decided between the sender and receiver. A byte with even parity has an even number of 1s in it. Conversely, a byte with an odd parity has an odd number of 1s. Take for example the following byte in which the first bit has been omitted.

_ 1 1 0 1 1 0 0

If a sender has agreed to even parity then we note there is currently an even number of 1s in the byte so the first bit should have 0 in it.

0 1 1 0 1 1 0 0

If the sender had agreed to odd parity the first bit should have a 1 in it.

1 1 1 0 1 1 0 0

If a byte is sent with one parity and received with another an error must have occurred and the receiver can ask for the data to be resent.The problem with this method is that it is impossible to find out which bit has the error. i.e. any of the bits could have changed during transmission. Furthermore if more than one bit is changed the byte can retain the same parity. For example the following two bytes have the same Odd parity even though they clearly do not carry the same data.

1 1 1 0 1 1 0 0
1 1 1 0 1 1 1 0

A parity block can be used to actually locate the error and correct it, assuming only on bit has changed in each row and column. Assuming even parity has been agreed to in the following example.

parity bit bit 2 bit 3 bit 4 bit 5 bit 6 bit 7 bit 8
byte 1 1 1 1 1 1 1 1 0
byte 2 0 0 1 1 0 0 1 1
byte 3 1 1 0 0 0 0 1 0
parity byte 0 0 0 0 1 1 0 1

As highlighted above it can be seen that there is an error in the bit 7 row and byte 3 column. They intersect at (bit 7, byte 3). Therefore that bit can be corrected to be a O and the error is corrected by the receiver. Since if more than one but change an error might not be given other methods are needed when error checking transmitted data.

Automatic Repeat Request (ARQ)

This uses an Acknowledgment (this is a message sent by the receiver to indicate that the message has been sent out correctly) and a Timeout (this is the time that is allowed to elapse before the acknowledgment is received). If an acknowledgment is not received by the sender from the receiver the message is automatically resent.

Checksum

  • Data is sent in blocks and an additional value called the checksum is also sent.
  • The maximum value that can be stored in one byte is 255. The value 0000 0000 is omitted in the following process.

If the sum of all the bytes is <= 255 the checksum is this value otherwise the following process is carried out.

  1. The value is divided by 256 and the quotient is found (divide and truncate).
  2. The new whole number value is multiplied by 256
  3. The difference between the original sum of all bytes and the value found using this algorithm is found. This is the checksum.

Concepts of different file types[edit]

Text files[edit]

Text and number file formats

  • Uses lossless file compression.
  • Text is usually stored as ASCII.

numbers can be stored as:

real example
integer 2.5454545
date 12/08/2122
time 19:00:20
currency $15.50

Picture files[edit]

Joint Photographic Experts Group (jpeg)

Picture resolution is the level of detail a picture has - it is often measured in pixels per centimetre.

  • JPEG is a lossy file format for images.
  • The reduced file size leads to reduced quality. JPEG relies on properties of the human eye (for example, the human eye can not discern between colours when they reach a certain point of similarity) and, up to a point, no real loss of quality is observed.

A uncompressed image is called a Raw bitmap. The bitmap image is reduced by a factor of 5 to 15 depending on its original quality when being converted to JPEG.

  • A 3-megapixel photo is an image that is 2048 pixels wide and 1536 pixels tall i.e. 3145728 pixels (hence it is slightly larger). Since each pixel contains 3 colours (red, green and blue) the total file size is megabytes = 9megabytes.

Sound files[edit]

Specification link

Show understanding of the concept of Musical Instrument Digital Interface (MIDI) files, JPEG files, MP3 and MP4 files - 2016 CIE Syllabus p10


Musical Instrument Digital Interface (MIDI)

MIDI is a communication protocol that allows electronic musical instruments to communicate with each other. It consists of a list of commands that instruct a device on how to produce a particular sound or note.

  • The MIDI files are not music and do not contain any 'sounds' hence they are very different to MP3 files.
  • It uses 8-bit serial transmission and is asynchronous.
  • MIDI operates on 16 different channels (numbered 0 to 15) which can all be used at the same time i.e. 16 devices all playing a different line in a song using a music sequencer.
  • MIDI files are much smaller than MP3 files and that makes them ideal for storing music files where storage is an issue i.e. storing ringtones on a mobile phone.
  • Each MIDI file has a specific sequence of bytes:
    1. The first byte is the status byte - informs the MDI device what function to preform - encoded in this is also the MIDI channel.
    2. The pitch byte specifies the note to be played.
    3. The velocity byte specifies how loud to play the note
  • This is all saved in a MIDI file with the file extension .mid

MPEG-3 (MP-3)

  • It is a Lossy format.
  • Uses audio compression software to convert music and other sounds into MP3 file format.
  • Uses file compression software to convert CD files
  • This greatly reduces file size (about 90%).
  • This works through Perceptual music shaping which reduce sounds that the human ear can not hear properly i.e. removes the quieter sound if a louder sound is played at the same time. The human ear can only hear the louder sound.
  • Bit rate is the number of bits used when creating a file. They are usually 80-320 kilobits per second; anything above 200 gives a sound quality close to the normal CD.

MPEG-4 (MP-4)

  • It is a Lossy format.
  • Can store music, videos, photos and animation (multimedia file format).


Lossless and Lossy file compression[edit]

Specification link

Show understanding of the principles of data compression (lossless and lossy compression algorithms) applied to music/video, photos and text files - 2016 CIE Syllabus p10

Lossless data compression

All the data bits from the original file are reconstructed when the file is uncompressed. This is important for files where any dat loss would be a disaster, for example, a spreadsheet.

An example is in text files which where repeated words or sections of words can be also replaced with a something shorter i.e. 'the' can be replaced with '1'.

Lossy data compression

The algorithm eliminates unnecessary bits of data.

In MP3 files this is in the form of Perceptual music shaping. In JPEG files this is dependent on the limitations of human vision. In MP4 files a variety of methods are used such as removing some of the colour information in each frame.