A-level Computing/WJEC (Eduqas)/Component 1/Algorithms and programs

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Algorithms:

An algorithm is a finite set of instructions to solve a specific problem. It can be represented using pseudocode, a flowchart or structured English.

Representation[edit | edit source]

Pseudocode[edit | edit source]

Pseudocodes are algorithms written in a 'generic' language. They are written with no one programming language in mind, but rather with the aim of being in a form closer to plain English. This allows the programmer to understand the logical and algorithmic side of a solution without having to worry about the nuances of a coding language. Providing pseudocode is unambiguous and the basic rules of any programming language are adhered to (such as variable declaration). For the exam, this is the pseudocode you will come across:

Standard Representation
Variable declaration Num Is Integer
Variable initialisation Set Num = 0
User input Input Num
Outputting a message Output "Type in next number."
Comment/annotation {price with VAT added}
Concatenation Output "Highest Mark", Total
Selection (IF) If Mean > 75 Then

Output "Further treatment required."

Else

Output "No further treatment required."

End If

While loop While Num < 0 Do

Input Num

End While

For loop For Count = 1 To 20

If Value > Max Then Max = Value

End For

While loop Repeat

If Number MOD Divisor = 0 Then

Prime = FALSE

End If


Set Divisor = Divisor +1

Until (Prime = FALSE) OR (Divisor = Number)

Flowcharts[edit | edit source]

Shows a decision in a flowchart. A combination of these and different shapes make a decision flowchart.

Flowcharts are a method of visually representing an algorithm (see Appendix D on the specification for the shapes). This is done by use of a diagram, beginning at a start label and following lines between various statements and questions that are enacted and adhered to until an endpoint is reached.

Each different instruction type is represented with a different shape: the start and end points are usually represented with an oval or a rounded rectangle, a process (such as a calculation) takes place within a rectangle, input is taken in by a trapezium, a parallelogram represents output, and a diamond (rhombus) is used to represent a decision. A decision-diamond will be where the path through a flowchart forks, with two lines coming out from it, which are usually labelled YES or NO. When following a flowchart, the correct line is selected based on the answer to the question written on the decision-diamond itself.

As you can probably tell, a flowchart quickly becomes complex for larger programs, can take a while to create with sufficient complexity, and is often laborious to follow. As a result, flowcharts are generally only used for smaller, less complex problems.

Contents[edit | edit source]

Variables and Constants[edit | edit source]

Constants[edit | edit source]

Algorithm CalculateVAT

RateVAT = 0.2   {this is a constant to store the VAT rate}

These hold a fixed value within a program, unlike a variable, which can be changed at any time. A constant is instead defined at the top of a program, and will hold the same value throughout, going as far as throwing an error if any part of the program tries to change it. This prevents accidental changes to a constant. Constants do not change often but when they do it is easier to change them at the top of the program rather than where they occur in the body of the program (e.g. multiple instances of a variable in a program).

Identifiers[edit | edit source]

Self-documenting Identifiers[edit | edit source]

Other people will inevitably read your code, it is rather rare to be working on your own. Variables should be named in a way where you can understand what their contents will contain, e.g. MemberID will contain the ID of a member. It is self-documenting, you do not need to read anything extra or consult an explanation to understand what contents it will hold. An example of a variable that is not self-documenting would be "value" or "x", as the meaning is very unclear.

Annotation[edit | edit source]

{comments help other programmers to understand code}

Annotation is pieces of English that are inserted into code. They help other programmers understand the code, but are not needed in any way for the program to run. Annotation is used to make programs easier to read by a different programmer or the same programmer at a later date, for example when improving the efficiency. When the program is being compiled, the comments are discarded.

Program Layout[edit | edit source]

Some languages explicitly require you to lay your code out in a certain way in order for it to work. However, others are rather lax in their approach. Good program layout requires you to indent your code and conform to certain standards such as putting spaces between operators. Some companies prefer writing in camelCase and others in Snake_Case, but whichever you use, you should stick to it throughout your entire coding project. See below for a idea of what is meant by program layout.

If CodeLayoutCheckPassed = 1 Then
       codeIndented = 1
       selfDocumentingIdentifiersUtilised = 1
Else
       codeIndented = 0
       selfDocumentingIdentifiersUtilised = 0
End If

This would be less preferable, as you can't clearly see the different paths the program could take:

If CodeLayoutCheckPassed=1 Then code_Indented=1 self_Documenting_IdentifiersUtilised=1 Else code_Indented=0 self_DocumentingIdentifiersUtilised=0 End If

Scope of Variables[edit | edit source]

Variables have a scope. A scope is where they can be used in a program. Local variables can only be used in the subprogram where they are defined, whereas global variables can be used throughout the entire program. Global variables only exist for the lifetime of the program (when it is running) and can be accessed at any time. Local variables only exist for the time the program is in the scope they were declared in, for example a "Item" variable in a For loop that is incremented by 1 until the loop finishes.

Subroutines[edit | edit source]

Large programs can also be broken into named blocks of code that carry out specific tasks. These are subroutines, and are important as, once defined, a subroutine can be called upon to perform its task at any point within a program. This makes them ideal for solving problems where the same sequence of instructions will need to be repeated throughout. Most languages use two types of subroutine - functions and procedures. Both act very much in the same way, except for the crucial detail that a function must return a single value to the main body of the program, while a procedure doesn't. This distinction is normally made clear in the way functions and procedures are defined.

Parameters[edit | edit source]

Both types of subroutine rely on parameters to pass values to be utilised. There are two ways to pass parameters, by value or by reference. Passing by value is where a local copy of the data is created which is discarded after the procedure has been run. Passing by reference is when the address is passed, rather than the actual data. A benefit of passing by value is that it avoids the unintended side effects where the parameter has its value changed in the main program and the procedure. Passing by reference is beneficial as it avoids the larger amount of processing and storage associated with creating copies of the data and allows desirable changes to be passed back into the main program.

Recursion[edit | edit source]

Function Fvalue (Num: Integer) : Integer
If Num = 1 Then
    Set Fvalue = 0
Else
    If Num = 2 Then
        Set Fvalue = 1
    Else
        Set Fvalue = Fvalue(Num-2) + Fvalue
    End If
End If
End Function

A recursive algorithm is when an algorithm calls itself within itself, until a terminating base case is met. Multiple 'copies' of the program are opened until the copies terminate (end). A common example of a use of this is a factorial which is a product of all positive integers less than or equal to the value. For example,

Mathematical Operations[edit | edit source]

Operator Symbol Description Example
Addition + The total amount of those values combined.
Subtraction - Removing one value from another.
Multiplication * The repeated addition of two numbers.
Division / Calculating the number of times a number is contained within another.
Integer Division DIV Returns the amount of times a number is contained within another.
Modulo MOD Returns the remainder after working out how many times a number is contained within another.
Negation - Flips the sign on the number.
Exponential ^ Raise the first number to the power of the second.

The second type is selection. These are statements that determine the output (or what code to be run) based on the given inputs and the logical rules set by the programmer. This is the basic IF statement - IF this is true, then do this ELSE do that. The final part of an algorithm is iteration - where code is intended to be run multiple times, until a terminating condition is met. Examples include the FOR loop, which will repeat for a set number of time, and the WHILE loop, which repeats while a certain condition is true.

Each part of any algorithm depends on variables for data storage. A variable is a named location in a computer's memory that can be used to store data while a program is running, which can then be called upon later. This is stored with a computer's RAM, which is labelled with memory addresses starting from 0. Variables are usually assigned names within a program, with specific details of storage location being left to the translator. Along with the information it stores, a variable has two main qualities - its scope and its lifetime. Simply put, a variable's scope is where in the program it is accessible from, whilst a variable's lifetime is the length of time a variable is accessible for. A local variable has a small scope and a limited lifetime, as it is only accessible within the subroutine that defines it, whilst a global variable, which exists for the entire length of the program, will have a large scope and a long lifetime.

Validation and Verification[edit | edit source]

Types and Examples
Suitable checks Example of invalid data
Presence check Nothing in the box
Format check to ensure that a data item matches a previously determined pattern, e.g. only accept 7 digits. 12345X
Length check to ensure that data entered is of a reasonable length, e.g. between 8 and 13. 12345678901234567890
Type check to ensure that data item is of a particular type, e.g. all entries must be digits. Micheal

Validation[edit | edit source]

Validation occurs when data is being entered. The purpose of validation checks is to detect errors and ensure data is reasonable. Some common examples are: presence check where the input cannot be NULL (nothing), a length check to ensure input meets a character limit/requirement (enforced on password fields), format check to ensure data is entered a certain way (an email must have an @ sign) or a character check to check only the accepted characters are input (wouldn't want text in a credit card number).

Verification[edit | edit source]

Verification occurs when data is being entered or when data is being transferred from one place to another. The purpose of verification is to ensure that data is consistent (mis-typing) and ensuring data has not been corrupted. There are two common ones: double entry (for passwords usually) and proof reading, for example when a page is presented with the summary before ordering a pizza. Double entry is when two values are entered and compared with each other. If they match, there are no input errors, if they do not match there is an input error.

Searching[edit | edit source]

Searching algorithms allow for the location of a certain piece of data - for example a specific user or their group permissions.

Linear Search[edit | edit source]

Starting at the beginning of an array, SearchValue is compared to every consecutive item in the array. This happens until an item matches (SearchValue has been found) or the end of the array has been reached (SearchValue is not present in the array).

myArray(6) Is Integer
SearchValue Is Integer
Found Is Boolean
Set Found = False
    
Input SearchValue
    
For i = 0 to 6
    If SearchValue = myArray(i)then
        Set Found = True
        Output "SearchValue found at position ", i
    End If
End For

If Found = False
    Output "SearchValue not found."
End if

Binary Search[edit | edit source]

Firstly, a binary search requires all elements to be sorted in ascending order. This search is much better than a linear search as if an element greater than SearchValue is discovered, the search can terminate as all other items will be greater than SearchValue. It works by:

  1. Calculating a midpoint and comparing SearchValue to the middle element.
  2. If this is not the SearchValue, the lower or upper half is searched until the SearchValue is found or the middle element is greater.
myArray(6) Is Integer
Start Is Integer
End Is Integer
Found Is Integer
Mid Is Integer

Set Start = 0
Set End = 6
Set Found = False

Input SearchValue

    Repeat
        Set Mid = (Start + End) DIV 2

        If SearchValue = myArray(Mid) Then
            Set Found = True
            Output "SearchValue found at position ", Mid
        End If

        If SearchValue > myArray(Mid) Then
            Set Start = Mid + 1
        End If

        If SearchValue < myArray(Mid) Then
            Set End = Mid – 1
        End If

    Until (Found = True) OR (End < Start)

If Found = False
    Output "SearchValue not found."
endif

Sorting[edit | edit source]

Sorting algorithms allow an unordered set of data elements to be organised, for example from ascending/descending or alphabetically.

Bubble Sort[edit | edit source]

In a bubble sort, items are moved based on whether they are higher or lower than the next item in the list. This results in the largest item repeatedly "bubbling" to the top of the list, which is where the algorithm gets its name.

  1. A pass is made through the data, comparing each value with the subsequent one, swapping them if necessary.
  2. This is repeated until the data is in order, no swaps are made.
Start Procedure SortMyArray
    n Is Integer
    temp Is Integer
    swapped Is Boolean

    set n = length(myArray) {returns the length of myArray}
    repeat
        set swapped = FALSE
            for i = 0 to (n – 1)
                if myArray(i) < myArray(i + 1) then
                temp = myArray(i + 1)
                myArray(i + 1) = myArray(i)
                myArray(i) = temp
                swapped = TRUE
                end if
            end for
   until (swapped = FALSE)

End Procedure

Insertion Sort[edit | edit source]

  1. Items are copied into an new sorted array one by one.
  2. Each new item is inserted into the correct place.
  3. All items in the new array are moved up a place.

Quicksort[edit | edit source]

  1. An item/pivot selected (which item is unimportant)
  2. Produce two new lists of smaller and larger numbers
  3. Repeat above points on new sublists (recursively) until sorted
Declare subprocedure QuickSort (myArray is string, indexLow is integer, indexHi is integer)

Pivot is string
tmpSwap is string
tmpLow is integer
tmpHi is integer
tmpLow = indexLow
tmpHi =indexHi
pivot = myArray (int(indexLow + indexHi)/2))

while (tmpLow <= tmpHi)
    while (myArray(tmpLow) < pivot and tmpLow < indexHi)
        tmpLow = tmpLow + 1
    wend

    while (myArray(tmpHi) > pivot and tmpHi > indexLow)
        tmpHi = tmpHi – 1
    wend

    if (tmpLow <= tmpHi) then
        tmpSwap = myArray(tmpLow)
        myArray(tmpLow) = myArray(tmpHi)
        myArray(tmpHi) = tmpSwap
        tmpLow = tmpLow + 1
        tmpHi = tmpHi -1
    end if
wend

if (indexLow < tmpHi) then
    QuickSort (myArray , indexLow, tmpHi)
Elseif (indexHi > tmpLow) then
    QuickSort (my Array, tmpLow, indexHi)
end if

end sub

Programming Constructs[edit | edit source]

Sequence[edit | edit source]

Sequence means executing instructions line by line, one after another. Below is a fragment of code written in Assembly language, each line will be executed in turn to run the code fragment.

LOAD 1,d1
LOAD 2,d2
XORR 1,2
STOR 1,d3
XORR 1,2

Selection and Repetition[edit | edit source]

The purpose of selection is to execute code, Biggest = Num2, if a specified condition is met, Num2> Biggest in this case is True.

The purpose of repetition (also called loops) is to repeatedly execute code until a certain condition is satisfied, Num1 is an Integer in this case.

Biggest Is Integer
Num1 Is Integer
Num2 Is Integer

Repeat
    Input Num1
Until (Num1 is an Integer)

Repeat
    Input Num2
Until (Num1 is an Integer)

Set Biggest = Num1

If Num2 > Biggest Then Biggest = Num2

Output Biggest

Counts[edit | edit source]


Rogue Values[edit | edit source]

Rogue values are seen by lookup tables as a termination signal.

Modular Programming[edit | edit source]

Standard Functions[edit | edit source]

A standard function is one some common it is provided by the interpreter or compiler, examples include square root, random number generators and output message boxes like those seen in Visual Basic. Languages that include these standard functions are much better than those that do not as they save the programmer time, as they do not need to write as much code. The function has been tested before and is unlikely to contain errors and it has been written by experts in the field and so will be succinct and efficient.

User-defined Subprograms[edit | edit source]

User defined subprograms like modules and functions are easier to write than one large program, make the program much more readable, are easy to test in isolation and can be written by an individual and implemented in a large program. They may also have been previously written and can be copied from another program or they may already exist for the specific problem.

Compression[edit | edit source]

When a file is compressed, the file is made smaller (possibly by reducing the amount of data). Compressing a file saves storage space and speeds up transfer when sending the file over the network. There are two main types of compression: Lossy and Lossless.

Lossy and Lossless[edit | edit source]

Lossy compression (as you may have guessed by the name) is when a file is made smaller but some information is lost in the process. Lossless is when a file is made smaller but no information is lost with this type. Most of the time, lossless is preferred but lossy is used for images on websites especially when they are a small size as a small amount of lost information (quality) does not make much of a difference, yet speeds up the load time of the page.

Examples of Lossless[edit | edit source]

Run length encoding is used when a character is repeated multiple times, it can be used in images which repeat a certain pixel colour. It is used by a flag character denoting the start ($), the letter that is repeated and then the number of times it is repeated, so AAAAAAAAA would become $A9. You get the idea.

Huffman encoding is an example of dictionary encoding (a dictionary of common characters each with their own reference). In English, some letters are used much more than others, for example 'A' is more common that seeing a 'Z'. Huffman encoding works by giving letters that appear more frequently shorter codes and letters that appear less often longer codes, resulting in a smaller compressed file.

Comparing Compression[edit | edit source]

You may be asked to compare lossy and lossless, or even a given compression algorithm in the exam. The first pointer to remember is that lossy is always better than lossless as much more data can easily be discarded as quality is lost. It will take a certain amount of time to compress the file and subsequently decompress it, these are called the compression time and decompression time. To work out how efficient a compression algorithm is we can calculate the 'compression ratio' which is

Testing[edit | edit source]

There are three types of test data that you must account for, consult the table below (1-100 is the allowed data).

Test Meaning Example
Normal/valid This is type of data we want users to enter, it is within our allowed values. 50
Invalid This is data is outside of our accepted range, or is a different data type entirely. 200 and "twenty"
Borderline/extreme This data is on the edge of our accepted values. 1 or 100