Optimizing C++/General optimization techniques/Sorting

From Wikibooks, open books for an open world
< Optimizing C++‎ | General optimization techniques
Jump to: navigation, search

The C++ Standard Template Library (STL) provides the template function sort that implements a comparison sort algorithm. Because sort is templatized, it can be used for various types of sequences holding any type of key, as long as the keys are comparable (implement the < operator). A good compiler can generate code optimized for the various kinds of sequence/key combinations.

The reference implementation of the STL uses the introsort algorithm (since the 2000 release; the GNU C++ library uses the reference implementation). This algorithm is a very fast combination of quicksort and heapsort with a specially designed selection algorithm.

The sort template function is not guaranteed to be stable. When a stable sort is required, use the stable_sort template function instead.

This section suggests alternatives to the sort and stable_sort template functions that may be faster in specific cases.

Sorting with small ranges of keys[edit]

To sort a data set according an integer key having a small range, use the counting sort algorithm.

The counting sort algorithm has O(N+M) complexity, where N is the number of elements to sort and M is the range of the sort keys, that is the difference between the highest key and the lowest key. In case N elements are to be sorted whose key is an integer number belonging to an interval containing no more than two times N values (i.e when M <= 2 * N holds), this algorithm may be quite faster than sort. In some cases it is faster even with larger ranges.

This algorithm may be used also for a partial ordering; for example, if the keys are integers between zero and one billion, you can still sort them using only the most significant byte, so to get an order for which the formula a_n < a_{n + 1} + 256*256*256 holds.

Example: sorting 8-bit integers[edit]

Say you want to sort an array of arbitrary char elements. These take on values in the range 0..CHAR_MAX (inclusive), accounting for CHAR_MAX+1 different values. CHAR_MAX is defined in the header <climits>.

#include <climits>
 
void count_sort(char *a, char *const end)
{
	size_t freq[CHAR_MAX+1] = {0};
	char *p;
 
	for (p = a; p < end; ++p)
		freq[*p] += 1;
 
	char c;
	for (c = 0, p = a; c < UINT8_MAX; ++c)
		while (freq[c]-- > 0)
			*p++ = c;
}

The counting_sort function implements the pigeonhole sort algorithm. It takes a pointer to the first element of the input array and a pointer that points one element beyond the end of the array. Why? Because we don't have to stop here.

We can generalize counting_sort to a template function that also works for string, vector<char> and other sequence types, without loss of efficiency. When doing so, we need to work with iterators rather than pointers.

#include <climits>
 
template <typename OutputIter>
void counting_sort(OutputIter const &begin, OutputIter const &end)
{
	size_t freq[CHAR_MAX+1] = {0};
	OutputIter it;
 
	for (it = begin; it < end; ++it)
		freq[*it] += 1;
 
	char c;
	for (c = 0, it = begin; c < CHAR_MAX; ++c)
		while (freq[c]-- > 0)
			*it++ = c;
}

Partial sorting[edit]

Partitioning[edit]

If you have to split a sequence according a criterion, use a partitioning algorithm, instead of a sorting one.

In STL there is the std::partition algorithm, that is faster than the std::sort algorithm, as it has O(N) complexity, instead of O(N log(N)).

Stable partitioning and sorting[edit]

If you have to partition or sort a sequence for which equivalent entities may be swapped, don't use a stable algorithm.

In STL there is the std::stable_partition partitioning algorithm, that is slightly slower than the std::partition algorithm; and there is the std::stable_sort sorting algorithm, that is slightly slower than the std::sort algorithm.

Order partitioning[edit]

If you have to pick out the first N elements from a sequence, or the Nth element in a sequence, use an order partitioning algorithm, instead of a sorting one.

In STL there is the std::nth_element algorithm, that, although slightly slower than the std::stable_partition algorithm, is quite faster then the std::sort algorithm, as it has O(N) complexity, instead of O(N log(N)).

Sorting only the first N elements[edit]

If you have to sort the first N elements of a much longer sequence, use an order statistic algorithm, instead of a sorting one.

In STL there are the std::partial_sort and std::partial_sort_copy algorithms, that, although slower than the std::nth_element algorithm, are so much faster than the std::sort algorithm as the partial sequence to sort is shorter than the whole sequence.