Parallel Computing and Computer Clusters/Software

Software

Under old-fashioned designs programs are often cited as being akin to a document format: they begin with a header section; they then go into a loop perform the same task or series of tasks over and over until some condition is met and finally they perform some form of exit cleanup. During the loop condition, the program performs a single task to completion prior to performing any subsequent condition and thus each task is performed in serial to its prior and subsequent tasks.

Sometimes in a serial design the individual tasks have no bearing on each other and as such could be executed in any order. Under these circumstances each task is ripe for being parallelised. Many programs, however, are staged - tasks are interdependent and the amount of dependency determines how simply a set of serial tasks may be redesigned to work in parallel.

Designing parallel tasks

Before work can begin in earnest on the implementation of parallelising a suite of tasks, it is necessary to identify the inter-dependencies of the tasks. To illustrate the principal problems faced by parallelising processes some fairly basic mathematical problems can be used as examples.

The sum of all numbers

The well-known problem faced by Carl Friedrich Gauss and his fellow class mates: add up all the integers between 1 and 100. Supposing there was no mathematical formula [(1 + N)*(N/2)] to calculate this particular problem and that the individual numbers really did have to be added up individually to each other:

The mathematical properties of adding any group of integers together allows the additional to take place in any order. For example, 1 + 2 + 3 + 4 and 4 + 3 + 2 + 1 result in the same summation of 10. By the same token (1 + 2) + (3 + 4) results in the equivalent calculation of 3 + 7 and in the end the same summation of 10. The problem of adding just four numbers can be described in alternative ways also:1 + (2 + 3) + 4 or (1 + (2 + (3 + 4))). In order to provide efficient parallelisation, it is important to design how best to split the problem at hand. These latter two examples show more dependency on previous calculations than that of dividing the problem down the middle:

Breakdown	# Calculations	# Independent calculations
(1 + 2) + (3 + 4)	3: 1+2=3; 3+4=7; 3+7=10	2: 1+2=3; 3+4=7
1 + (2 + 3) + 4	3: 2+3=5; 5+1=6; 6+4=10	0: all calculations are dependent
(1 + (2 + (3 + 4)))	3: 3+4=7; 7+2=9; 9+1=10	0: all calculations are dependent

This approach for dividing down the middle works for any series of numbers where the count of numbers is a factor of two: 8, 16, 32. However, taking this initial approach to a longer series which utilises an even count of numbers provides another problem in the quest for parallelisation: 1 + 2 + 3 + 4 + 5 + 6. This calculation can no longer be simply divided down the middle: 1 + (2 + 3) + (4 + 5) + 6. It is already known that the calculation of 1 + 6 could be done, yet in this approach it is left out. To combat this deficiency it is therefore necessary to alter the approach to work for all even counts of numbers: pairing. Using pairing it is possible to reduce the calculations: (1 + 2) + (3 + 4) + (5 + 6). The parallelisation has now reduced the original problem to that of 3 + 7 + 11. The problem now enters a new phase entirely...

The problem still has not been solved for all counts of numbers - so far only even counts have been solved. Due to mathematical properties it is not possible to break down the summation of 3 + 7 + 11 into the separate parts of 3 + 7 and 7 + 11 and thus knowing your subject matter becomes an ever-increasingly important aspect of the parallisation: ensuring that individual parts to do repeat the whole or part of another part. For the problem at hand, then, it is a logical assumption that the tasks at hand can be partitioned into pairs and pairs of pairs until the problem as a whole is consumed with the possible exception of a single number to be added as the last step. Thus, 1 + 2 + 3 + 4 + 5 + 6 can be reduced into the parallelised form of ((1 + 2) + (3 + 4)) + (5 + 6) as it's most efficient.

The final step

The final step in the quest for the design of the parallelised problem may come as a surprising reversal: can the individual tasks be better implemented as a serial process? Knowing the architecture the problem is to be implemented on becomes an additional design principal. In the case of adding a small quantity of any numbers (not necessarily sequential as above), the majority of MPUs are very capable of performing on-die calculations in the form ((a + b) + c) + d faster and more efficiently than of the form (a + b) + (c + d) due to the need to load and store the data into registers. A command sequence for both tasks might appear as follows:

Pseudo-computing cycle	((a + b) + c) + d	(a + b) + (c + d)
1	Load a into accumulator	Load a into accumulator
2	Load b into second register	Load b into second register
3	Add second register to accumulator	Add second register to accumulator
4	Load c into second register	Store accumulator's part 1 result to memory or alternative register
5	Add second register to accumulator	Load c into accumulator
6	Load d into second register	Load d into second register
7	Add second register to accumulator	Add second register to accumulator
8	Store result to main memory	Load part 1 result to second register
9		Add second register to accumulator
10		Store result to main memory

Specific Theories

...?

Compilers

...?

External links