Optimizing C++/General optimization techniques/Memoization
From Wikibooks, the open-content textbooks collection
Memoization techniques (aka caching techniques) are based on the principle that if you must compute many times a pure function, that is a referentially transparent function (aka mathematical function), for the same argument, and if such computation requires significant time, you can save time by saving the result of the first evaluation, and retrieve such result the other times.
[edit] Look-up table
If you have to call often a pure function that has a small integer interval as domain, pre-compute (at compile time or at program start-up time) all the values of the function for every value of the domain, and put them in a static array, called lookup table. When you need the value of the function for a particular value of the domain, read the corresponding value of such array.
For example, having to compute the square root of an integer number between 0 and 9, the following function is faster:
double sqrt10(int i) { static double lookup_table[] = { 0, 1, sqrt(2.), sqrt(3.), 2, sqrt(5.), sqrt(6.), sqrt(7.), sqrt(8.), 3, }; assert(0 <= i && i < 10); return lookup_table[i]; }
An array access is very fast, mainly if the accessed cell is in processor data cache. Therefore, if the lookup table is not large, almost surely its access is faster than the function to evaluate.
If the lookup table comes out to be rather large, it may be no more convenient, for the memory footprint, for the time to pre-compute all the values, but mainly for the fact that it doesn't fit the processor data cache any more.
But if the function to evaluate is quite slow, it is called many times, and you can afford to use much memory for that, you should consider using a lookup table large up to several hundreds of KB. Though, it is rarely convenient to exceed one megabyte.
[edit] One-place cache
If you have to call often a pure function with the same arguments, the first time the function is called save the arguments and the result in static variables; when the function is called again, compare the new arguments with the saved ones; if they match, return the saved result, otherwise compute the result and store the new arguments and the new result.
For example, instead of the following function:
double f(double x, double y) { return sqrt(x * x + y * y); }
you can write the following function:
double f(double x, double y) { static double prev_x = 0; static double prev_y = 0; static double result = 0; if (x == prev_x && y == prev_y) { return result; } prev_x = x; prev_y = y; result = sqrt(x * x + y * y); return result; }
Notice that, to have a speed up, it isn't necessary that the function be called with the same arguments for all the program session. It is enough that it is called some times with the same arguments, then some other times with other arguments, and so on. In such a case, the computation is performed only when the arguments change.
Obviously, instead of increasing the speed, this technique may decrease it, in case the function is called with almost always changing arguments, or if the comparison between the new arguments and the old ones requires more time than the computation of the function itself.
Notice that the use of static variables causes that this function is no more thread-safe, and cannot be recursive. If this function must be called concurrently by several threads, it is necessary to replace the static variables with variables that are distinct for every thread.
Notice also that in the example it is assumed that the function has zero value when both arguments are zero. Failing this, you should choose another solution, like one of the following:
- Initialize the variable result with the value that corresponds to all-zero arguments.
- Initialize the variables prev_x and prev_y with values that will never be passed as arguments.
- Add a static flag indicating whether the static variables keep valid values, and check such flag at every call.
[edit] N-places cache
If you have to call many times a pure function with arguments that in most cases belong to a small domain, use a static map (aka dictionary), initially empty. When the function is called, search the map for the function argument. If you find it, returns the associated value, otherwise compute the result and insert into the map the pair argument-result.
Here is an example, in which the map is implemented using an array, for the same function used for the example of the guideline "One-place cache" in this section:
double f(double x, double y) { static const int n_buckets = 8; // should be a power of 2 static struct { double x; double y; double result; } cache[n_buckets]; static int last_read_i = 0; static int last_written_i = 0; int i = last_read_i; do { if (cache[i].x == x && cache[i].y == y) { return cache[i].result; } i = (i + 1) % n_buckets; } while (i != last_read_i); last_read_i = last_written_i = (last_written_i + 1) % n_buckets; cache[last_written_i].x = x; cache[last_written_i].y = y; cache[last_written_i].result = sqrt(x * x + y * y); return cache[last_written_i].result; }
Some functions, although they have a theoretically large domain, are called most often with few distinct arguments.
For example, a word processor may have many installed fonts, but in a typical document only very few fonts are used for most characters. A rendering function that has to handle the font of every character of the document will be called typically with very few distinct values.
For such cases, an N-places cache is preferable to a one-place cache, as in the example.
The remarks about static variables, in guideline "One-place cache" in this section, apply also to this case.
For small caches (in the example, having 8 places) the most efficient algorithm is a sequential scan on an array. Though, to implement a larger cache, a search tree or a hash table could be more efficient.
In addition, the cache of the example has fixed size, but it could be expedient to have a variable-size cache.
Usually, the last read element is the most likely for the next call. Therefore, like in the example, it may be expedient to save its position, and to begin the search from that position.
If the cache do not expand itself indefinitely, there is the problem to choose the element to replace. Obviously, it would be better to replace the element that is the least likely to be requested by the next call.
In the example, it is assumed that, among the elements in the cache, the first inserted element is the least probable for the next call. Therefore, the writes scan cyclically the array.
Often, a better criterion is to replace the least recently read element, instead of the least recently written element. To implement such criterion, a somewhat more complex algorithm is required.