Optimizing C++/Optimization life cycle

From Wikibooks, open books for an open world
Jump to: navigation, search

The construction of an efficient application should perform the following development process:

  1. Design. At first, the algorithms and data structures are designed in such a way as makes sense for the application logic, and that is reasonably efficient, but without considering optimization. Where a data structure of wide usage is to be defined, whose optimal implementation is not obvious (for example, it is arguable if is better to use an array or a linked list), an abstract structure is defined, whose implementation may be changed at the optimization stage.
  2. Implementation. Then the code that implements the designed algorithms is written, following the guidelines to avoid some inefficient operations, and to encapsulate the operations that will probably require optimization.
  3. Functional testing. Then the resulting software is tested, to increase the probability that it doesn't have crippling defects.
  4. Optimization (aka Tuning). After having completed the development of a correctly working application, the optimization stage begins, having the following sub-stages:
    1. Performance testing. The commands with inadequate performance are spotted, that is the commands that, when processing typical data, require more time or more storage space than what it is specified by requirements are singled out.
    2. Profiling (aka Performance analysis). For every command with inadequate performance, a profiler is used to determine which portions of code are the so-called bottlenecks for that command. Bottlenecks are the portions of code where a disproportionately large amount of time is spent or memory space allocated.
    3. Algorithmic optimization. In bottlenecks, optimization techniques substantially independent from the programming language and totally independent from the platform are applied. They are the techniques that can be found in algorithm theory textbooks. This optimization consists in attempting to decrease the number of the executed machine instructions, and, in particular, the number of the calls to costly routines, or to transform the expensive instructions to equivalent but less costly instructions. For example, the quick sort sorting algorithm is chosen instead of the selection sort algorithm. If this optimization makes the program fast enough, the optimization stage is complete.
    4. Platform independent optimization. In bottlenecks, optimization techniques that are dependent on the programming language and its standard library, but independent both on the software platform and on the hardware platform are applied. For example, integer operations are used instead of floating point operations, or the more appropriate container class is chosen among the ones available in the standard library. If this makes the program fast enough, the optimization stage is complete.
    5. Software platform dependent optimization. In bottlenecks, optimization techniques that are dependent both on the programming language and on the software platform, but independent on the hardware platform are applied. For example, compiler options, pragma compiler directives, language extensions, non-standard libraries, direct calls to the operating system are exploited. If this makes the program fast enough, the optimization stage is complete.
    6. Hardware platform dependent optimization. In bottlenecks, optimization techniques that are dependent on the hardware platform are applied, like machine instructions existing only on a specific processor family or high level features that, even being allowed for every processor architecture, come out to be advantageous only for some processor types.

This development process follows two criteria:

  • Principle of diminishing returns. The optimizations giving big results with little effort should be applied first, as this minimizes the time needed to reach the performance goals.
  • Principle of diminishing portability. It is better to apply first the optimizations applicable to several platforms, as they remain applicable even when changing platform, and as they are more understandable by other programmers.

In the rare cases of software that will have to be used with several compilers and several operating systems but just one processor architecture, the stages 4.5 and 4.6 should be swapped.

This stage sequence is not meant to be a one-way sequence, that is such that when a stage is reached, the preceding stage is no more reached. In fact, every stage may succeed or fail. If it succeeds, the next stage is reached, while if it fails, the previous stage is reached, in a sort of backtracking algorithm.

In addition, a partial performance test should be performed after every optimization attempt, just to check whether the attempt comes out to be useful, and, in the positive case, to check if it comes out to be ultimate, that is whether some more optimizations are needed or not.

At last, after having completed the optimization, both the functional testing and the complete performance testing are to be repeated, to guarantee that the new optimized version of the software hasn't worsened either its correctness nor its general performance.

This book is about only three of the above stages:

  • Stage 2, specifically to the usage of the C++ language, in chapter "Writing efficient code".
  • Some general techniques for the stage 4.3, with examples in C++, in chapter "General optimization techniques".
  • Stage 4.4, specifically to the usage of the C++ language, in chapter "Code optimization".

Conventions [edit]

By object, it is meant an allocated region of memory. In particular, a piece of data associated to a variable of a fundamental type (like bool, double, unsigned long, or a pointer) is an object, as it is such the data structure associated to an instance of a class. With every variable an object is associated, whose size is given by the sizeof C++ operator, but an object could have no variable associated with it, or several variables associated with it. For example, a pointer is an object, but it can point to another object; this pointed object is not associated with any variable. On the other hand, in the following code, both the variable a and the variable b are associated with the same object:

int a;
int& b = a;

Arrays, structs, and class instances are objects which, if not empty, contain sub-objects. Therefore, they will be called aggregate objects.

We say that an object owns another object when the destruction of the former object causes the destruction of the latter. For example, a non-empty vector object typically contains a pointer to a buffer containing all the elements; the destruction of the vector causes the destruction of such buffer, and therefore we say that this buffer is owned by the vector object.

Some optimizations are useful only for short data sequences, others for longer sequences. Later on, the following classification will be used for objects sizes:

  • Tiny: No more than 8 bytes. It fits in one or two 32-bit registers or in one 64-bit register.
  • Small: More than 8 bytes, but no more than 64 bytes. It doesn't fit in a processor register, but it fits in a processor data cache line, and it can be wholly referenced by very compact machine instructions using an offset from the start address.
  • Medium: More than 64 bytes, but no more than 4096 bytes. It doesn't fit in a processor data cache line, and it cannot be wholly referenced by compact machine instructions, but it fits in the processor first-level data cache, it fits in a virtual memory page, and it fits in a mass storage cluster of blocks.
  • Large: More than 4096 bytes. It doesn't fit in the processor first-level data cache, it doesn't fit in a single virtual memory page, and it doesn't fits in a single mass storage cluster of blocks.

For example, an array of doubles is considered tiny only if it contains exactly one element, small if it has 2 to 8 elements, medium if it has 9 to 512 numbers, large if it has more than 512 of them.

Because there are very different hardware architectures, the given numbers are only an indication. Though, such numbers are rather realistic, and can be taken as serious criteria to develop software covering the main architectures in a rather efficient way.