A-level Computing/CIE/Advanced Theory/Data representation
User-defined data types
File organisation and access
Real numbers and normalised floating-point representation
User-defined Data types
A user-defined data type is a data type which the programmer has designed for use within a program, as opposed to a built-in data type.
User-defined data types can be composite or non-composite. A non-composite data type is defined without reference to another data type, whereas a composite data type is built from other data types.
Enumerated Data type
An enumerated data type is a non-composite data type defined from an ordered list of values. Variables can be declared by this data type, and assigned one of the values in the list
e.g. The following pseudocode declares an enumerated data type for months,
TYPE TMonth = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec) // The TMonth data type is declared DECLARE ThisMonth : TMonth // A variable is declared with the TMonth enumerated data type. ThisMonth ← May // The variable ThisMonth ThisMonth > Feb // This is True, as May is later in the list than Feb
Note: Values in an enumerated data type are not string values; do not enclose them with quotation marks.
Pointer Data type
A pointer data type references memory locations. Thus, it has to relate to the type of data that it is pointing to.
e.g. This pseudocode declares a pointer data type, a pointer, and uses the pointer.
TYPE TMyPointer = ^Integer // This declares TMyPointer as a pointer data type which points to Integers. DECLARE IntegerPointer : TMyPointer // This declares a variable with the TMyPointer data type. ValuePointedTo ← IntegerPointer^ // This accesses the data stored at the address which IntegerPointer points to. This is known as dereferencing.
Pointers are typically used to construct complex data structures, such as linked lists and binary trees. These data structures are discussed later on.
Record Data type
A record data type stores a collection of information regarding a common subject, similar to a record in a database. It is constructed from several fields, which each have their own data types; thus, the record data type is a composite data type.
e.g. The following pseudocode defines a record type for a student record:
TYPE TStudentRecord // The record consists of several fields DECLARE FirstName : STRING // Each field has its own data type DECLARE LastName : STRING DECLARE Absences : INTEGER DECLARE Class : STRING ENDTYPE DECLARE Student1 : TStudentRecord // The variable is declared as a record Student1.FirstName ← "John" // Fields can be accessed using dot notation Student1.LastName ← "Doe" Student1.Absences ← 0 Student1.Class ← "A2 Level"
Set Data type
A set data type is a composite data type that allows a programmer to apply set theory operations to data in a program.
These operations typically include:
- Including an element
- Excluding an element
- Checking whether an element is in a set
Object Data type
An object data type is a composite data type used in object-oriented programming to define classes.
Essentially, objects are just records with functions that act on the data that they contain.
A file is a collection of records. Each record is a collection of fields. Every field consists of a value.
A serial file is a collection of records with no defined order. Records enter the file in chronological order. All records have a defined format so that they can be input and output correctly.
A text file can be considered an example of a serial file: a series of characters are input, in chronological order, to produce a file.
A common use of serial files is for real-time processing. Records can be entered in real time, as quickly as possible, because they do not need to be sorted. This makes serial files efficient.
A sequential file stores records in order of a key field. In order for it to be possible to sort records by key field, this field needs to be unique and sequential but does not need to be consecutive.
In a sequential file, a particular record can be found by reading all of the key fields until you reach the one you are looking for.
A direct-access file is a collection of records that can be directly accessed without having to check every record. This is acheived using a hash table.
A hash table is a table of data which is ordered not by the key field, but by the hash value of the key field. Thus, data can be directly accessed by hashing the key field, rather than having to look through each record one by one.
There are two ways to access a specific record within a file: sequential access and direct access. Serial and sequential files can be accessed using sequential access and direct-access files can be accessed using direct access.
Sequential access is where each record in the file is read, one by one, until the desired record is found.
Direct access is where a hashing algorithm is used to jump to a specific record in the file.
Deleting & Editing Data
In a sequentially accessed file, deleting and editing data requires the creation of a new file. Data is moved from the old file to the new file until the part where the record needs editing is reached.
However, in a direct-access file, data can be deleted or edited in place: there is no need for a new file.
Floating-point notation is a way of representing very small or very large numbers with the same amount of bits. It is similar to scientific notation.
A floating-point number consists of three parts: the sign bit, the mantissa, and the exponent. To find the value of the number, we use where ± is determined by the sign bit, M is the mantissa, and E is the exponent.
The Mantissa-Exponent Tradeoff
When making a floating-point format, the designer must choose how many bits to allocate to the mantissa and to the exponent. If more bits are allocated to the mantissa, the floating-point value is more precise; whereas if more bits were allocated to the exponent, the floating-point system could represent a greater range of values.
Normalisation is the process of choosing the floating-point representation of a number such that every number that can be represented in the floating-point system has one and only one valid representation.
Without normalisation, there could be multiple valid representations for the same number. For example, the number 2.0 can be represented as
0 0010000 0100,
0 0100000 0011, or
0 1000000 0010. Thus, we need a standard way of referring to a given number.
This is where normalisation comes in: normalisation means that the only correct form of the number is where the sign bit and mantissa most significant bit are different. So for 2.0,
0 1000000 0010 would be the valid representation.
Underflow is where the number is too small to be represented using the floating-point system.
e.g. In a system with 8 bits for the mantissa and 4 bits for the exponent, the lowest possible exponent is
1000, or -8 in denary. If the system is normalised, the smallest positive mantissa value is
0 1000000. Thus, the smallest positive number in this system is
0 1000000 1000, which is equal to 1/512. If a calculation in this system resulted in a number which was lower than 1/512, there would be an underflow error, because the number is too small to be stored.
Overflow is similar to underflow, but it occurs when a number is too large to be stored in the system.
e.g. In a system with an 8-bit mantissa and a 4-bit exponent, the largest possible number that can be represented is
0 1111111 0111, which is equal to 127. If a calculation produced a number higher than 127, there would be an overflow error and the number could not be stored.
Overflow and underflow can both occur with negative values that are too large or too small.
A rounding error is where a number cannot be represented exactly, and needs to be approximated.
e.g. The number 1/3 can only be represented in binary using recurring bits (0.0101). The floating-point format does not allow for recurring bits as there is only a finite amount of memory in the system. Thus, it needs to be rounded, so it will be represented as
0 1010101 1111.
Enumerated data type
Record data type
Pointer data type
With direct access, you can jump directly to the record you want, whereas with sequential access, you need to go through each record one by one.
Mantissa: 01101000 Exponent: 0001