Pascal Programming/Sets

From Wikibooks, open books for an open world
Jump to navigation Jump to search

This chapter introduces you to a new custom data type. Sets are one of the basic structured data types. When programming you will frequently find that some logic can be modeled with sets. Learning and mastering usage of sets is a key skill, since you will encounter them a lot in Pascal.

Basics[edit]

Notion[edit]

Sets are (possibly empty) aggregations of distinguishable objects. Either a set contains an object, or it does not. An object being part of a set is also referred to as element of that set.

Let's say we know the objects “apple”, “banana” and “pencil”. The set Fruit ≔ {“apple”, “banana”} contains the objects “apple” and “banana”. “Pencil” is not a member of the set Fruit.

Digitization[edit]

When a computer is supposed to store and process a set, it actually handles a series of Boolean values.[fn 1] Every one of those Boolean values tells us whether a certain element is part of a set.

Note A computer does also store a Boolean value for every object that is not part of a certain set.

Sets in Pascal[edit]

The computer needs to know how many Boolean values it needs to set aside. In order to achieve this, a set in Pascal requires an ordinal type as a set’s base type. An ordinal type always has a finite range of permissible discrete values, thus the computer knows beforehand how many Boolean values to reserve, how many elements we can expect a set contain at most. In consequence, a valid set type declaration is:

type
	characters = set of char;

A variable of the data type characters can only contain char values. This set cannot contain, for instance, 42, that is an integer value, nor is this information stored in any way.

Red x.svg
A data type declaration for a set of real is illegal, since the set’s base type, the data type real, is not an ordinal data type.

Remember, in order to qualify as an ordinal data type there must be a means to assign every legal value an integer value.[fn 2]

Sets are particularly useful in conjunction with enumeration data types, which you just learned in previous chapter. Let’s consider an example in Pascal:

program setDemo;

type
	skill = (cooking, cleaning, driving, videogames, eating);
	skills = set of skill;

var
	slob: skills;
begin
	slob := [videogames, eating];
end.

Here, we have declared a variable slob, which represents a set of the skill enumeration data type values. In the penultimate line we populate our set slob with two objects, videogames and eating. The brackets indicate a set literal. [videogames, eating] is a set expression which we are assigning to the slob variable.

The set variable slob contains no other objects. However, the computer still stores five Boolean values for every potential member of that set. The number five is number of elements in skill, the set’s base type. The information that cooking, cleaning and driving are not part of the set slob is stored explicitly (by the proper Boolean value false).

Inspecting a set[edit]

If we want to learn, whether a certain object is part of a set, the set operator in yields the corresponding Boolean value the computer uses to store that information.

program setInspection(output);
type
	skill = (cooking, cleaning, driving, videogames, eating);
	skills = set of skill;
var
	slob: skills;
begin
	slob := [videogames, eating];
	
	if videogames in slob then
	begin
		writeLn('Is she a level 45 dungeon master?');
	end;
end.

The in operator is one of Pascal’s non-commutative operators. This means, you cannot swap the operands. On the RHS you always need to write an expression evaluating to a set value, whereas the LHS has to be an expression evaluating to this set’s base type.

Even though we, as humans, can say that 42 in slob is wrong, i. e. false, such a comparison is illegal. Per definition, the slob set can only contain skill values.

Operations[edit]

So far, sets probably seemed like a really complicated way for using Boolean values. The true power of sets lies in a number of distinct operations, making sets an easier, and thus better alternative to handling two or more individual (but related) Boolean values directly.

Combinations[edit]

In Pascal, two sets of the same kind, the same data type, can be combined forming a new set of the respective data type. Following operators are available:

name mathematical symbol source code symbol
union +
difference -
intersection *
symmetric difference ><
set operators in Pascal

 The symmetric difference operator is only defined in EP.

Union[edit]

In a Venn diagram both circles represent a (positive) number of objects belonging to either or both sets. The red area represents the result of a union.

The result of unifying two sets into one is called union. Let’s say, recently our slob has learned how to drive and does that now too. This can be written as:

slob := slob + [driving];

Now, slob contains all objects it previously held, plus all objects from the other set, [driving].


Difference[edit]

difference as a Venn diagram: right circle “minus” left circle

Of course sets can be deprived of a set of elements by using the difference operator, in source code written as -.

slob := slob - [];

This removes all objects present in the second set from the first set. Here, the empty set ([]) does not contain any objects, thus removing no objects has virtually no effect on slob.


Intersection[edit]

intersection as a Venn diagram: the overlapping area, if any, here colored in red, represents the intersection

Furthermore you can intersect sets. The intersection of two sets is defined as the set of elements both operands contain.

program intersectionDemo;
type
	skill = (cooking, cleaning, driving, videogames, eating);
	skills = set of skill;
var
	slob, blueCollar, common: skills;
begin
	blueCollar := [cooking, cleaning, driving, eating];
	slob := [driving, videogames, eating];
	
	common := blueCollar * slob;
end.

The set common now (only) contains driving and eating, because those are the objects member of both operands, of both given sets.


Symmetric difference[edit]

symmetric difference as a Venn diagram

A disjunct result to the intersection gives the symmetric difference. It is the union of the operands without the elements contained in both sets.

unique := blueCollar >< slob;

Now unique is [cooking, cleaning, videogames], because those are the values from either set, but not both.

Comparisons[edit]

Two sets of the same kind, the same data type, can be compared by looking at each element in both sets.

name         mathematical symbol source code symbol
equality = =
inequality <>
inclusion <=
inclusion >=
element in
comparison operators for sets in Pascal

All comparison operators, as before, evaluate to a Boolean expression.

Inclusion[edit]

The inclusion of a two sets means that all objects one set contains are present in another set. If the expression A <= B evaluates to true, all objects present in the set A are also present B. In a Venn diagram you will notice that one circle’s area is completely surrounded by another circle, if not identical to the other circle.

Note Expressions about empty sets are always true, despite not having any objects to check for. The expression [] <= someSet always evaluates to true, regardless of someSet’s value (it may even be another empty set).

Equality and inequality[edit]

The equality of two sets is defined as A <= B and B <= A. All objects contained in the left-hand set are present in the right-hand set and vice versa. In other words, there is not a single object that is present in just one of the sets. The inequality is just the negation thereof.

Element of[edit]

The in operator is the only set operator that does not act on two sets but on one potential set member candidate and a set. It has been introduced above. With respect to Venn diagrams, though, you can say that the in operator is “like” pointing with your index finger to a point inside a circle, or outside of it.

Pre-defined set routines[edit]

Cardinality[edit]

(After initialization) at any time a set contains a certain number of elements. In mathematics the number of objects being part of a set is called cardinality. The cardinality of a set can be retrieved using the function card, an EP extension.

emptySet := [];
writeLn(card(emptySet));

This will print 0 as there are no elements in an empty set.

Unfortunately, not all compilers implement the card function. The FPC does not have none. The GPC does supply one, though.

Universe[edit]

Originally, Wirth proposed a function all:

all(T) is the set of all values of type T
[1]

For example:

superwoman := all(skill);

The set superwoman would contain all available skill values, cooking, cleaning, driving, videogames, eating.

Unfortunately, this proposal never made it into the ISO standards, nor do the FPC or GPC support that function, or provide an equivalent. The only alternative is to use an appropriate set constructor (an EP extension): [cooking..eating] is equivalent to all(skill), provided that cooking is the first skill and eating the last skill value (referring to the order these items were listed during the data type declaration of skill).

Inclusion and exclusion[edit]

Not standardized, but convenient is BP’s definition of include and exclude procedures. These are shorthand for very frequent set manipulations.

The procedures allow you to quickly add or remove one object from one set.

include(recognizedLetters, 'Z');

is identical to

recognizedLetters := recognizedLetters + ['Z'];

but you do not need to type out the set name twice and everything, thus reducing the chance of typing mistakes. Likewise,

exclude(primeSieve, 4);

will do the same as

primeSieve := primeSieve - [4];

Both, the FPC and GPC, support these handy routines, which are in fact in all cases implemented as compiler intrinsics, not actual procedures.

Intermediate usage[edit]

Set literals[edit]

Effectively stating sets is a required skill when handling sets. It is important to understand that sets merely store the information that an object is a member of a set, or not. The set ['A', 'A', 'A'] is identical to ['A']. Specifying 'A' multiple times does not make it “more” part of that set.

Also, it is not necessary to list all members in any particular order. ['X', 'Z', 'Y'] is just as acceptable as ['X', 'Y', 'Z'] is. Mathematically speaking, sets are not ordered. Pascal’s requirement that a set’s base data type has to be an ordinal type is purely a technical requirement. For readability reasons it is usually sensible, though, to list elements in ascending order.

The EP standard gives you nice short notation for set literals containing a continuous series of values. Instead of writing [7, 8, 9, 11, 12, 13] you can also write ranges like [7..9, 11..13] evaluating to the very same value. Of course, all numbers could also be variables, or expressions in general.

Set literals are always a positive statement which objects are in a set. If we wanted a set of integer values between 0 and 10 without 3, 5 and 7, but do not want to write this set out entirely (i. e. as [0, 1, 2, 4, 6, 8, 9, 10]), you can either write [0..2, 4, 6, 8..10] or the expression [0..10] - [3, 5, 7]. The latter is probably a little easier to grasp what objects are and which are not in the final set.

Memory restrictions[edit]

Although a set of integer is legal and complies with all Pascal standards, many compilers do not support such large sets. Per definition, a set of integer can contain (at most) all values in the range -maxInt..maxInt. That is a lot (try writeLn(maxInt) or read your compiler’s documentation to find out this value). On a 64‑bit platform this value (usually) is 263—1, i. e. 9,223,372,036,854,775,808. As of the year 2020 many computers will quickly run out of main memory if they attempted to hold that many Boolean values.

As a consequence, BP restricts permissible set’s base types. In BP the base type’s largest and smallest values’ ordinal values have to be in the range 0..255. The value 255 is 28−1. As of version 3.2.0, the FPC sets the same limitations.

The GPC allows set definitions beyond 28 elements, although some configuration is required: You need to specify the ‑‑setlimit command-line parameter or a specially crafted comment in your source code:

program largeSetDemo;
{$setLimit=2147483647} // this is 2³¹ − 1
type
	// 1073741823 is 2³⁰ − 1
	characteristicInteger = -1073741823..1073741823;
	integers = set of characteristicInteger;
var
	manyManyIntegers: integers;
begin
	manyManyIntegers := [];
	include(manyManyIntegers, 1073741823);
end.

This will instruct the GPC that a set of characteristicInteger can only store up to this many characteristicInteger values.

Loops[edit]

Now that you have made the acquaintance of enumeration data types and sets, you see yourself faced with dealing a growing number of data. Pascal, like many other programming languages, support a language construct called loops.

Characteristics[edit]

Loops are (possibly empty) sequences of statements that are repeated over and over again, or even never, based on a Boolean value. The sequence of statements is termed loop body. The loop head contains (possibly implicitly) a Boolean expression determining whether the loop body is executed. Every time the loop body is run, an iteration is in progress.

The term loop originates from the circumstance that some early models of computers required programs to be fed (“loaded”) via punched paper tape. If a portion of that paper tape was meant to be processed multiple times, that piece of paper tape was cut, bent and temporarily fixated so it formed a physical loop. Thankfully, advancements in computer technology has made it far more convenient to handle repeating code.

Pascal (and many other programming languages) differentiate between two groups of loops:

  • counting loops, presented here, and
  • conditional loops, presented in a chapter to come.

Counting loops have in common that, before running the first iteration it can already be determined how many times the loop body will be executed just by evaluating the loop head.[fn 3] Conditional loops on the other hand are based on an abort condition, i. e. a Boolean expression. Except for infinite loops, there is no way to tell in advance how many times, how many iterations a conditional loop will have without thoroughly (mathematically) analyzing the loop body and loop head, and possibly even considering circumstances beyond the loop.

Counting loops[edit]

Counting loops do not necessarily count a quantity. They are named after the fact that they employ a variable, a counting variable. This variable of any ordinal data type (de facto) assigns every iteration a number.

A counting loop is introduced by the reserved word for:

program forLoopDemo(output);
var
	i: integer;
begin
	for i := 1 to 10 do // loop head
	begin               // ┐
		writeLn(i:3);   // ┝ loop body
	end;                // ┘
end.

After for follows a specially crafted assignment to the counting variable.

Range of counting variable[edit]

1 to 10 (with the auxiliary reserved word to) denotes a range of values the counting variable i will assume while executing the loop body. 1 and 10 are both expressions possessing the counting variable’s data type, that means there could also appear variables or more complex expressions, not just constant literals as shown.

This range is like a set. It may possibly be empty: The range 5 to 4 is an empty range, since there are no values between 5 up to and including 4. In consequence, the counting variable will not be assigned any value out of this empty range, as there simply are none available, and the loop body is never executed. Nevertheless, the range 8 to 8 contains exactly one value, i. e. 8.

During the first iteration the corresponding counting variable, here i, will have the first value out of the given range, the start value, in the example above this is the value 1. In the successive iteration the variable i has the value 2, and so forth up to and including the final value of the given range, here 10.

Immutability of counting variable[edit]

It is not necessary to actually utilize the counting variable inside the loop body, but you can use it if you are just obtaining its current value: Inside the loop body of for‑loops it is forbidden to assign any values to the counting variable. Forbidden assignments include, but are not limited to putting the counting the variable on the LHS of :=, but also read/readLn may not use the counting variable. Tampering with the counting variable is forbidden, because the loop head will effectively employ succ(i) to obtain the next iteration’s value. The loop head implicitly contains the Boolean expression counting variablefinal value. If the counting variable was manipulated this condition might never be met, thus destroying the characteristics of a for‑loop. Preventing the programmer to do any assignments preemptively ensures such an infinite loop is not, accidentally as well as deliberately, created.

Reverse direction[edit]

Pascal also allows for‑loops in a reversed direction using the reserved word downto instead of to:

program downtoDemo(output);
var
	c: char;
begin
	for c := 'Z' downto 'B' do
	begin
		write(c, ' ');
	end;
	writeLn('A');
end.

Here, the range is 'Z' down and including to 'B'. The loop’s terminating condition is still counting variablefinal value, but in this case the counting variable c becomes pred(c) (not succ) at the end of each iteration, after the loop body has been executed.

Loops on collections[edit]

EP allows to iterate over discrete aggregations, such as sets. This is particularly useful if you have a routine that needs to be applied to every value of an aggregation. Here is an example to demonstrate the principle:

 1 program forInDemo(output);
 2 var
 3 	c: char;
 4 	vowels: set of char;
 5 begin
 6 	vowels := ['A', 'E', 'I', 'O', 'U'];
 7 	
 8 	for c in vowels do
 9 	begin
10 		writeLn(c);
11 	end;
12 end.

Now you see the word in again, but in this context c in vowels is not an expression. The data type restrictions for in are still in effect: On the RHS an aggregation expression is given, whereas the LHS is in this case a variable that has the aggregation’s data type. This variable will be assigned every value out of the given aggregation every time an iteration is processed.

Since the RHS just needs an expression, not necessarily a variable, so you can shorten the example even further to:

8     for c in ['A', 'E', 'I', 'O', 'U'] do

Note, unlike the counting loops above, you are not supposed to make any assumptions about the order the loop variable is assigned values to. It may be in ascending, descending, or completely mixed up “order”, but the specific order is “implementation defined”, i. e. it depends on the used compiler. Accompanying documents of the compiler explain in which order the for in loop is processed.

Tasks[edit]

What value does the set expression fruit - fruit evaluate to?
This expression will evaluate to an empty set, because the difference operator “removes” all objects the latter set contains from the former. Here, we are referring to the same set twice, so both operands are equal, effectively rendering the expression to “remove all objects in this very same set”. Needless to say, but it is very counter-productive to ever write such an expression, since we already know its result, but it is allowed anyway.
This expression will evaluate to an empty set, because the difference operator “removes” all objects the latter set contains from the former. Here, we are referring to the same set twice, so both operands are equal, effectively rendering the expression to “remove all objects in this very same set”. Needless to say, but it is very counter-productive to ever write such an expression, since we already know its result, but it is allowed anyway.

Sources:

  1. Wirth, Niklaus. The Programming Language Pascal (Revised Report ed.). p. 38. 

Notes:

  1. This is an analogy for explanation. The ISO standards do not set this requirement. Compiler developers are free to choose whatever implementation of the set data type they think is suitable. It would be perfectly OK to just store a list of elements that are present in a set and nothing else. Nevertheless, all, Delphi, FPC, as well as the GPC do store sets as a series of bits (“Boolean values”) as it is explained here.
  2. The data type real does not qualify as an ordinal data type. Although it stores a finite subset of ℚ, the set of rational numbers, so there is a map T ↦ ℕ, this depends on the real data type’s precision (T), thus there is no one standard way of defining ord for real values, but many.
  3. This statement ignores many dialects’ extensions break/leave/continue/cycle that neither the ISO standards 7185 (Standard Pascal) or 10206 (Extended Pascal) define, or the goto statement.

Next Page: Arrays | Previous Page: Enumerations

Home: Pascal Programming