User:Kayau/Computer Performance

From Wikibooks, open books for an open world
Jump to navigation Jump to search
    • Note: This page is being temporarily developed in my userspace. I will move it to the mainspace later. **

Clock speed[edit | edit source]

  • Each operation of the processor begins with the pulse of the system clock – this is clock cycle or clock tick.
  • The speed of the processor can be measured by the pulse frequency in cycles per second (Hz) $ndash; this is clock rate.
  • The time between two pulses is the cycle time.

Disadvantages[edit | edit source]

Clock speeds measures how fast a computer can process operations. However, each instruction takes a different number of operations, so clock speed is not a reliable measure of performance. Thus we have to look at instructions per second.

Instructions per second[edit | edit source]

To make a more accurate assessment of the processor's performance, we first need to figure out how many cycles an instruction takes.

Average cycles per instruction

Average cycles per instruction , where is the total number of instructions in a program or in a certain time interval, is the number of cycles per instruction and is the instruction count for a certain instruction .

Using the CPI, we can figure out the approximate execution time"

Execution time (Version I)

The execution time , where is the constant clock rate, constant cycle time , is the total number of instructions in a program or in a certain time interval and is the average number of cycles per instruction.

However, moving data to and from the main memory usually takes more time than performing operations in the processor. In other words the processor cycle time may be longer than the memory cycle time. Let be the number of clock cycles for decoding and execution, be the number of memory references, and be the memory cycle time. We find that the new processor time . Let . Then the equation can be simplified:

Execution time (Version II)

Let be the processor cycle time, be the total number of instructions in a program or in a certain time interval, be the average number of cycles per instruction, is the constant clock rate such that constant cycle time , be the number of clock cycles for decoding and execution, be the number of memory references, and be the ratio between the memory and processor cycle times.

Then the execution time

MIPS and MFLOPS[edit | edit source]

MIPS rate is one way of expressing the instruction execution rate, in terms of millions of instructions per second, i.e. . Since , . Substituting this into our original MIPS rate formula gives the following:

MIPS rate

An alternative is MFLOPS, which measures only floating-point instructions:

MFLOPS rate

Disadvantages[edit | edit source]

Instruction execution rate measures how fast a computer can handle instructions. However, the number of assembly instructions compiled from the same high-language instruction may be different in different systems. Two computers may have the same MIPS, but one may be significantly faster because a high-language instruction compiles to three assembly instructions on one and eight in the other. Thus we need to look at benchmarks.

Benchmarks[edit | edit source]

Picking a mean[edit | edit source]

A benchmark program is a program written in a high-level programming language used to test the performance of a system. A collection of benchmark programs, or a benchmark suite, can be run on a computer to test its speed. These programs are usually written with a specific purpose in mind. For example, the SPECweb99 suite was written to assess the performance of web servers.

After obtaining the instruction execution rate of each program, the results should be averaged using a formula. There are three ways to obtain this average:

Pythagorean means

Arithmetic mean:

Geometric mean:

Harmonic mean:

The result of an arithmetic mean is the average execution rate. However, this is not helpful for the users, who are more concerned about execution time. Harmonic means are more useful for this purpose.

SPEC benchmarks[edit | edit source]

SPEC benchmark suites measure performance in a different way. They measure two different metrics. The speed metric measures the computer's ability to perform a single task, whereas the rate metric measures its ability to perform multiple tasks.

To obtain the speed metric, we need to obtain the execution time of the benchmark programs on the system under test (SUT). Then we need to find the ratio of the reference execution time to the actual execution time. Finally, we evaluate the geometric mean of these figures. The larger the final result, the higher the speed compared to the reference time. Let be the reference execution time and the be the execution time of the system under test. Then:

Speed metric

For a finite number of benchmark programs, speed metric =

To obtain the rate metric, we need to run multiple copies of the benchmark program at the same time, usually the number of CPUs on the computer (but not necessarily so). The method for finding the rate metric is similar to that of the speed metric, except we need to multiply the reference execution time by the number of copies run simultaneously, say :

Rate metric

For a finite number of benchmark programs, speed metric =

Amdahl's Law[edit | edit source]

Some code must be executed serially, i.e. it can only be executed by one processor at a time. The speed gain from using several processors instead of one is given by Amdahl's Law:

Amdahl's Law

Given:

  • , the number of threads of execution,
  • , the fraction of the algorithm that is strictly serial,

The time an algorithm takes to finish when being executed on thread(s) of execution corresponds to:

Therefore, the theoretical speedup that can be had by executing a given algorithm on a system capable of executing threads of execution is:

A consequence of this is that, holding constant above 0, there is cap on the speed gain by adding multiple cores, namely

We can even generalise Amdahl's law to speedup features in general if we let be the fraction of the code affected by that feature.