A-level Computing/CIE/Advanced Theory/Data representation

From Wikibooks, open books for an open world
Jump to navigation Jump to search
A-level Computing
Data Representation Communication and Internet Technologies
Specification link

User-defined data types

  • show understanding of why user-defined types are necessary
  • define and use non-composite types: enumerated, pointer
  • define and use composite data types: set, record and class/object
  • choose and design an appropriate user-defined data type for a given problem

File organisation and access

  • show understanding of methods of file organisation: serial, sequential (using a key field) and random (using a record key)
  • show understanding of methods of file access:
    • sequential access for serial and sequential files
    • direct access for sequential and random files.
  • select an appropriate method of file organisation and file access for a given problem

Real numbers and normalised floating-point representation

  • describe the format of binary floating-point real numbers
  • convert binary floating-point real numbers into denary and vice versa
  • normalise floating-point numbers
  • show understanding of the reasons for normalisation
  • show understanding of the effects of changing the allocation of bits to mantissa and exponent in a floating-point representation
  • show understanding of how underflow and overflow can occur
  • show understanding of why computers cannot represent mathematical real numbers such as √2 or π, only approximations
  • show understanding of the consequences of a binary representation only being an approximation to the real number it represents
  • show understanding that binary representations can give rise to rounding errors

User-defined Data types[edit | edit source]

A user-defined data type is a data type which the programmer has designed for use within a program, as opposed to a built-in data type. A non-composite type is defined without reference to another data type, whereas a composite data type is built from other data types.

Enumerated Data type[edit | edit source]

An enumerated data type is a non-composite data type defined from an ordered list of values. Variables can be declared by this data type, and assigned one of the values in the list

e.g. The following pseudocode declares an enumerated data type for months,

TYPE TMonth = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec) // The TMonth data type is declared

DECLARE ThisMonth : TMonth // A variable is declared with the TMonth enumerated data type.

ThisMonth ← May // The variable ThisMonth

ThisMonth > Feb // This is True, as May is later in the list than Feb

Note: Values in an enumerated data type are not string values; do not enclose them with quotation marks.

Pointer Data type[edit | edit source]

A pointer data type references memory locations. Thus, it has to relate to the type of data that it is pointing to.

e.g. This pseudocode declares a pointer data type, a pointer, and uses the pointer.

TYPE TMyPointer = ^Integer // This declares TMyPointer as a pointer data type which points to Integers.

DECLARE IntegerPointer : TMyPointer // This declares a variable with the TMyPointer data.

ValuePointedTo  IntegerPointer^ // This accesses the data stored at the address which IntegerPointer points to. This is known as dereferencing.

Pointers are typically used to construct complex data structures, such as linked lists and binary trees. These data structures are discussed later on.

Record Data type[edit | edit source]

A record data type stores a collection of information regarding a common subject, similar to a record in a database. It is constructed from several fields, which each have their own data types; thus, the record data type is a composite data type.

e.g. The following pseudocode defines a record type for a student record:

TYPE
TStudentRecord                    // The record consists of several fields
  DECLARE FirstName : STRING      // Each field has its own data type
  DECLARE LastName : STRING
  DECLARE Absences : INTEGER
  DECLARE Class : STRING
ENDTYPE

DECLARE Student1 : TStudentRecord // The variable is declared as a record

Student1.FirstName ← "John"       // Fields can be accessed using dot notation
Student1.LastName ← "Doe"
Student1.Absences ← 0
Student1.Class ← "A2 Level"

Set Data type[edit | edit source]

A set data type is a composite data type that allows a programmer to apply set theory operations to data in a program.

These operations typically include:

  • Union
  • Difference
  • Intersection
  • Including an element
  • Excluding an element
  • Checking whether an element is in a set

Object Data type[edit | edit source]

An object data type is a composite data type used in object-oriented programming to define classes.

Essentially, objects are just records with functions that act on the data that they contain.

A data object is a region of storage that contains a value or group of values. Each value can be accessed using its identifier or a more complex expression that refers to the object. In addition, each object has a unique data type.

File Organisation[edit | edit source]

A file is a collection of records. Each record is a collection of fields. Every field consists of a value.

Serial Files[edit | edit source]

A serial file is a collection of records with no defined order. Records enter the file in chronological order. All records have a defined format so that they can be input and output correctly.

A text file can be considered an example of a serial file: a series of characters are input, in chronological order, to produce a file.

A common use of serial files is for real-time processing. Records can be entered in real time, as quickly as possible, because they do not need to be sorted. This makes serial files efficient.

Sequential Files[edit | edit source]

A sequential file stores records in order of a key field. In order for it to be possible to sort records by key field, this field needs to be unique and sequential but does not need to be consecutive.

In a sequential file, a particular record can be found by reading all of the key fields until you reach the one you are looking for.

Direct-Access Files[edit | edit source]

A direct-access file is a collection of records that can be directly accessed without having to check every record. This is acheived using a hash table.

A hash table is a table of data which is ordered not by the key field, but by the hash value of the key field. Thus, data can be directly accessed by hashing the key field, rather than having to look through each record one by one.

Accessing Files[edit | edit source]

There are two ways to access a specific record within a file: sequential access and direct access. Serial and sequential files can be accessed using sequential access and direct-access files can be accessed using direct access.

Sequential access is where each record in the file is read, one by one, until the desired record is found.

Direct access is where a hashing algorithm is used to jump to a specific record in the file.

Deleting & Editing Data[edit | edit source]

In a sequentially accessed file, deleting and editing data requires the creation of a new file. Data is moved from the old file to the new file until the part where the record needs editing is reached.

However, in a direct-access file, data can be deleted or edited in place: there is no need for a new file.

Floating-point Numbers[edit | edit source]

Floating-point notation is a way of representing very small or very large numbers with the same amount of bits. It is similar to scientific notation.

A floating-point number consists of three parts: the sign bit, the mantissa, and the exponent. To find the value of the number, we use where ± is determined by the sign bit, M is the mantissa, and E is the exponent.

A representation of the binary value of 0.15625 in a 32-bit system with 8 bits for the exponent and 23 bits for the mantissa
An example of a floating point format in IEEE 754

The Mantissa-Exponent Tradeoff[edit | edit source]

When making a floating-point format, the designer must choose how many bits to allocate to the mantissa and to the exponent. If more bits are allocated to the mantissa, the floating-point value is more precise; whereas if more bits were allocated to the exponent, the floating-point system could represent a greater range of values.

Normalisation[edit | edit source]

Normalisation is the process of choosing the floating-point representation of a number such that every number that can be represented in the floating-point system has one and only one valid representation.

Without normalisation, there could be multiple valid representations for the same number. For example, the number 2.0 can be represented as 0 0010000 0100, 0 0100000 0011, or 0 1000000 0010. Thus, we need a standard way of referring to a given number.

This is where normalisation comes in: normalisation means that the only correct form of the number is where the sign bit and mantissa most significant bit are different. So for 2.0, 0 1000000 0010 would be the valid representation.

Floating-point Errors[edit | edit source]

Underflow[edit | edit source]

Underflow is where the number is too small to be represented using the floating-point system.

e.g. In a system with 8 bits for the mantissa and 4 bits for the exponent, the lowest possible exponent is 1000, or -8 in denary. If the system is normalised, the smallest positive mantissa value is 0 1000000. Thus, the smallest positive number in this system is 0 1000000 1000, which is equal to 1/512. If a calculation in this system resulted in a number which was lower than 1/512, there would be an underflow error, because the number is too small to be stored.

Overflow[edit | edit source]

Overflow is similar to underflow, but it occurs when a number is too large to be stored in the system.

e.g. In a system with an 8-bit mantissa and a 4-bit exponent, the largest possible number that can be represented is 0 1111111 0111, which is equal to 127. If a calculation produced a number higher than 127, there would be an overflow error and the number could not be stored.

Overflow and underflow can both occur with negative values that are too large or too small.

Rounding Errors[edit | edit source]

A rounding error is where a number cannot be represented exactly, and needs to be approximated.

e.g. The number 1/3 can only be represented in binary using recurring bits (0.0101). The floating-point format does not allow for recurring bits as there is only a finite amount of memory in the system. Thus, it needs to be rounded, so it will be represented as 0 1010101 1111.

Questions[edit | edit source]

Questions

What user-defined data type is defined by a list of values?

Answer:

Enumerated data type

What user-defined data type is defined by a set of fields?

Answer:

Record data type

What user-defined data type is used for making dynamic data structures?

Answer:

Pointer data type

What type of file should be used for real-time processing?

Answer:

Serial file

What type of file stores data in a hash table?

Answer:

Direct-access file

What type of file stores records according to their key field?

Answer:

Sequential file

What is the difference between direct access and sequential access?

Answer:

With direct access, you can jump directly to the record you want, whereas with sequential access, you need to go through each record one by one.

Find the normalised floating-point value of +1.625 in a system with 8 bits for the mantissa and 4 bits for the exponent.

Answer:

Mantissa: 01101000 Exponent: 0001

What value is represented by the normalised floating point number with mantissa 10011100 and exponent 0100?

Answer:

-12.5