C++ Programming/Chapter Advanced Features Print version

From Wikibooks, open books for an open world
Jump to navigation Jump to search


Copyright Notice

Authors

The following people are authors to this book:

Panic, Thenub314

You can verify who has contributed to this book by examining the history logs at Wikibooks (http://en.wikibooks.org/).

Acknowledgment is given for using some contents from other works like Wikipedia, the wikibooks Java Programming and C Programming and the C++ Reference, as from the authors Scott Wheeler, Stephen Ferg and Ivor Horton.

The above authors release their work under the following license:
Unless otherwise noted media and source code and source code usable in stand alone form have their own copyrights and different licenses (but compatible with the Copyleft nature of the work). Media may be under Fair use, that is not recognized in some jurisdictions. Source code or scripts not listing copyrights or license information shall be considered in the public domain.

Advanced Features

Templates

Templates are a way to make code more reusable. Trivial examples include creating generic data structures which can store arbitrary data types. Templates are of great utility to programmers, especially when combined with multiple inheritance and operator overloading. The Standard Template Library (STL) provides many useful functions within a framework of connected templates.

As templates are very expressive they may be used for things other than generic programming. One such use is called template metaprogramming, which is a way of pre-evaluating some of the code at compile-time rather than run-time. Further discussion here only relates to templates as a method of generic programming.

By now you should have noticed that functions that perform the same tasks tend to look similar. For example, if you wrote a function that prints an int, you would have to have the int declared first. This way, the possibility of error in your code is reduced, however, it gets somewhat annoying to have to create different versions of functions just to handle all the different data types you use. For example, you may want the function to simply print the input variable, regardless of what type that variable is. Writing a different function for every possible input type (double, char *, etc. ...) would be extremely cumbersome. That is where templates come in.

Templates solve some of the same problems as macros, generate "optimized" code at compile time, but are subject to C++'s strict type checking.

Parameterized types, better known as templates, allow the programmer to create one function that can handle many different types. Instead of having to take into account every data type, you have one arbitrary parameter name that the compiler then replaces with the different data types that you wish the function to use, manipulate, etc.

  • Templates are instantiated at compile-time with the source code.
  • Templates are type safe.
  • Templates allow user-defined specialization.
  • Templates allow non-type parameters.
  • Templates use “lazy structural constraints”.
  • Templates support mix-ins.
Syntax for Templates

Templates are pretty easy to use, just look at the syntax:

 template <class TYPEPARAMETER>

(or, equivalently, and preferred by some)

 template <typename TYPEPARAMETER>

Function template

There are two kinds of templates. A function template behaves like a function that can accept arguments of many different types. For example, the Standard Template Library contains the function template max(x, y) which returns either x or y, whichever is larger. max() could be defined like this:

    template <typename TYPEPARAMETER>
    TYPEPARAMETER max(TYPEPARAMETER x, TYPEPARAMETER y)
    {
        if (x < y)
            return y;
        else
            return x;
    }

This template can be called just like a function:

    std::cout << max(3, 7);   // outputs 7

The compiler determines by examining the arguments that this is a call to max(int, int) and instantiates a version of the function where the type TYPEPARAMETER is int.

This works whether the arguments x and y are integers, strings, or any other type for which it makes sense to say "x < y". If you have defined your own data type, you can use operator overloading to define the meaning of < for your type, thus allowing you to use the max() function. While this may seem a minor benefit in this isolated example, in the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining < allows a type to be used with the standard sort(), stable_sort(), and binary_search() algorithms; data structures such as sets, heaps, and associative arrays; and more.

As a counterexample, the standard type complex does not define the < operator, because there is no strict order on complex numbers. Therefore max(x, y) will fail with a compile error if x and y are complex values. Likewise, other templates that rely on < cannot be applied to complex data. Unfortunately, compilers historically generate somewhat esoteric and unhelpful error messages for this sort of error. Ensuring that a certain object adheres to a method protocol can alleviate this issue.

{TYPEPARAMETER} is just the arbitrary TYPEPARAMETER name that you want to use in your function. Some programmers prefer using just T in place of TYPEPARAMETER.

Let us say you want to create a swap function that can handle more than one data type... something that looks like this:

 template <class SOMETYPE> 
 void swap (SOMETYPE &x, SOMETYPE &y) 
 { 
   SOMETYPE temp = x; 
   x = y; 
   y = temp; 
 }

The function you see above looks really similar to any other swap function, with the differences being the template <class SOMETYPE> line before the function definition and the instances of SOMETYPE in the code. Everywhere you would normally need to have the name or class of the datatype that you're using, you now replace with the arbitrary name that you used in the template <class SOMETYPE>. For example, if you had SUPERDUPERTYPE instead of SOMETYPE, the code would look something like this:

 template <class SUPERDUPERTYPE> 
 void swap (SUPERDUPERTYPE &x, SUPERDUPERTYPE &y) 
 { 
   SUPERDUPERTYPE temp = x; 
   x = y; 
   y = temp; 
 }

As you can see, you can use whatever label you wish for the template TYPEPARAMETER, as long as it is not a reserved word.

Class template

A class template extends the same concept to classes. Class templates are often used to make generic containers. For example, the STL has a linked list container. To make a linked list of integers, one writes list<int>. A list of strings is denoted list<string>. A list has a set of standard functions associated with it, which work no matter what you put between the brackets.

If you want to have more than one template TYPEPARAMETER, then the syntax would be:

 template <class SOMETYPE1, class SOMETYPE2, ...>
Templates and Classes

Let us say that rather than create a simple templated function, you would like to use templates for a class, so that the class may handle more than one datatype. You may have noticed that some classes are able to accept a type as a parameter and create variations of an object based on that type (for example the classes of the STL container class hierarchy). This is because they are declared as templates using syntax not unlike the one presented below:

 template <class T> class Foo
 {
 public:
   Foo();
   void some_function();
   T some_other_function();

 private:
   int member_variable;
   T parametrized_variable;
 };

Defining member functions of a template class is somewhat like defining a function template, except for the fact, that you use the scope resolution operator to indicate that this is the template classes' member function. The one important and non-obvious detail is the requirement of using the template operator containing the parametrized type name after the class name.

The following example describes the required syntax by defining functions from the example class above.

 template <class T> Foo<T>::Foo()
 {
   member_variable = 0;
 }

 template <class T> void Foo<T>::some_function()
 {
   cout << "member_variable = " << member_variable << endl;
 }

 template <class T> T Foo<T>::some_other_function()
 {
   return parametrized_variable;
 }

As you may have noticed, if you want to declare a function that will return an object of the parametrized type, you just have to use the name of that parameter as the function's return type.

Note:
A class template can be used to avoid the overhead of virtual member functions in inheritance. Since the type of class is known at compile-time, the class template will not need the virtual pointer table that is required by a class with virtual member functions. This distinction also permits the inlining of the function members of a class template.

Advantages and disadvantages

Some uses of templates, such as the max() function, were previously filled by function-like preprocessor macros.

// a max() macro
#define max(a,b)   ((a) < (b) ? (b) : (a))

Both macros and templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.

However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe. Templates avoid some of the common errors found in code that makes heavy use of function-like macros. Perhaps most importantly, templates were designed to be applicable to much larger problems than macros. The definition of a function-like macro must fit on a single logical line of code.

There are three primary drawbacks to the use of templates. First, many compilers historically have very poor support for templates, so the use of templates can make code somewhat less portable. Second, almost all compilers produce confusing, unhelpful error messages when errors are detected in template code. This can make templates difficult to develop. Third, each use of a template may cause the compiler to generate extra code (an instantiation of the template), so the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables.

The other big disadvantage of templates is that to replace a #define like max which acts identically with dissimilar types or function calls is impossible. Templates have replaced using #defines for complex functions but not for simple stuff like max(a,b). For a full discussion on trying to create a template for the #define max, see the paper "Min, Max and More" that Scott Meyer wrote for C++ Report in January 1995.

The biggest advantage of using templates, is that a complex algorithm can have a simple interface that the compiler then uses to choose the correct implementation based on the type of the arguments. For instance, a searching algorithm can take advantage of the properties of the container being searched. This technique is used throughout the C++ standard library.

Linkage problems

While linking a template-based program consisting over several modules spread over a couple files, it is a frequent and mystifying situation to find that the object code of the modules won't link due to 'unresolved reference to (insert template member function name here) in (...)'. The offending function's implementation is there, so why is it missing from the object code? Let us stop a moment and consider how can this be possible.

Assume you have created a template based class called Foo and put its declaration in the file Util.hpp along with some other regular class called Bar:

 template <class T> Foo
 {
 public: 
   Foo();
   T some_function();
   T some_other_function();
   T some_yet_other_function();
   T member;
 };

 class Bar
 {
   Bar();
   void do_something();
 };

Now, to adhere to all the rules of the art, you create a file called Util.cc, where you put all the function definitions, template or otherwise:

 #include "Util.hpp"

 template <class T> T Foo<T>::some_function()
 {
  ...
 }

 template <class T> T Foo<T>::some_other_function()
 {
  ...
 }

 template <class T> T Foo<T>::some_yet_other_function()
 {
  ...
 }

and, finally:

 void Bar::do_something()
 {
   Foo<int> my_foo;
   int x = my_foo.some_function();
   int y = my_foo.some_other_function();
 }

Next, you compile the module, there are no errors, you are happy. But suppose there is another (main) module in the program, which resides in MyProg.cc:

 #include "Util.hpp"	// imports our utility classes' declarations, including the template

 int main()
 {
   Foo<int> main_foo;
   int z = main_foo.some_yet_other_function();
   return 0;
 }

This also compiles clean to the object code. Yet when you try to link the two modules together, you get an error saying there is an undefined reference to Foo<int>::some_yet_other function() in MyProg.cc. You defined the template member function correctly, so what is the problem?

As you remember, templates are instantiated at compile-time. This helps avoid code bloat, which would be the result of generating all the template class and function variants for all possible types as its parameters. So, when the compiler processed the Util.cc code, it saw that the only variant of the Foo class was Foo<int>, and the only needed functions were:

 int Foo<int>::some_function();
 int Foo<int>::some_other_function();

No code in Util.cc required any other variants of Foo or its methods to exist, so the compiler generated no code other than that. There is no implementation of some_yet_other_function() in the object code, just as there is no implementation for

 double Foo<double>::some_function();

or

 string Foo<string>::some_function();

The MyProg.cc code compiled without errors, because the member function of Foo it uses is correctly declared in the Util.hpp header, and it is expected that it will be available upon linking. But it is not and hence the error, and a lot of nuisance if you are new to templates and start looking for errors in your code, which ironically is perfectly correct.

The solution is somewhat compiler dependent. For the GNU compiler, try experimenting with the -frepo flag, and also reading the template-related section of 'info gcc' (node "Template Instantiation": "Where is the Template?") may prove enlightening. In Borland, supposedly, there is a selection in the linker options, which activates 'smart' templates just for this kind of problem.

The other thing you may try is called explicit instantiation. What you do is create some dummy code in the module with the templates, which creates all variants of the template class and calls all variants of its member functions, which you know are needed elsewhere. Obviously, this requires you to know a lot about what variants you need throughout your code. In our simple example this would go like this:

1. Add the following class declaration to Util.hpp:

 class Instantiations
 {
 private:
   void Instantiate();
 };

2. Add the following member function definition to Util.cc:

 void Instantiations::Instantiate()
 {
   Foo<int> my_foo;
   my_foo.some_yet_other_function();
   // other explicit instantiations may follow
 }

Of course, you never need to actually instantiate the Instantiations class, or call any of its methods. The fact that they just exist in the code makes the compiler generate all the template variations which are required. Now the object code will link without problems.

There is still one, if not elegant, solution. Just move all the template functions' definition code to the Util.hpp header file. This is not pretty, because header files are for declarations, and the implementation is supposed to be defined elsewhere, but it does the trick in this situation. While compiling the MyProg.cc (and any other modules which include Util.hpp) code, the compiler will generate all the template variants which are needed, because the definitions are readily available.

Template Meta-programming overview

Template meta-programming (TMP) refers to uses of the C++ template system to perform computation at compile-time within the code. It can, for the most part, be considered to be "programming with types" — in that, largely, the "values" that TMP works with are specific C++ types. Using types as the basic objects of calculation allows the full power of the type-inference rules to be used for general-purpose computing.

Compile-time programming

The preprocessor allows certain calculations to be carried out at compile time, meaning that by the time the code has finished compiling the decision has already been taken, and can be left out of the compiled executable. The following is a very contrived example:

#define myvar 17

#if myvar % 2
   cout << "Constant is odd" << endl;
#else
   cout << "Constant is even" << endl;
#endif

This kind of construction does not have much application beyond conditional inclusion of platform-specific code. In particular there's no way to iterate, so it can not be used for general computing. Compile-time programming with templates works in a similar way but is much more powerful, indeed it is actually Turing complete.

Trait classes are a familiar example of a simple form of template meta-programming: given input of a type, they compute properties associated with that type as output (for example, std::iterator_traits<> takes an iterator type as input, and computes properties such as the iterator's difference_type, value_type and so on).

The nature of template Meta-programming

Template meta-programming is much closer to functional programming than ordinary idiomatic C++ is. This is because 'variables' are all immutable, and hence it is necessary to use recursion rather than iteration to process elements of a set. This adds another layer of challenge for imperative programmers learning TMP: as well as learning the mechanics of it, they must learn to think in a different way.

Limitations of Template Meta-programming

Because template meta-programming evolved from an unintended use of the template system, it is frequently cumbersome. Often it is very hard to make the intent of the code clear to a maintainer, since the natural meaning of the code being used is very different from the purpose to which it is being put. The most effective way to deal with this is through reliance on idiom; if you want to be a productive template meta-programmer you will have to learn to recognize the common idioms.

It also challenges the capabilities of older compilers; generally speaking, compilers from around the year 2000 and later are able to deal with much practical TMP code. Even when the compiler supports it, the compile times can be extremely large and in the case of a compile failure the error messages are frequently impenetrable. For a template instantiation debugger, see TempLight.

Some coding standards may even forbid template meta-programming, at least outside of third-party libraries like Boost.

History of TMP

Historically TMP is something of an accident; it was discovered during the process of standardizing the C++ language that its template system happens to be Turing-complete, i.e., capable in principle of computing anything that is computable. The first concrete demonstration of this was a program written by Erwin Unruh, which computed prime numbers although it did not actually finish compiling: the list of prime numbers was part of an error message generated by the compiler on attempting to compile the code.[1] TMP has since advanced considerably, and is now a practical tool for library builders in C++, though its complexities mean that it is not generally appropriate for the majority of applications or systems programming contexts.

#include <iostream>

template <int p, int i>
class is_prime {
public:
	enum { prim = ( (p % i) && is_prime<p, i - 1>::prim ) }; 
};

template <int p>
class is_prime<p, 1> {
public:
	enum { prim = 1 };
};

template <int i>
class Prime_print {      // primary template for loop to print prime numbers
public:
	Prime_print<i - 1> a; 
	enum { prim = is_prime<i, i - 1>::prim };
	void f() {
		a.f();
		if (prim)
		{
			std::cout << "prime number:" << i << std::endl;
		}
	} 
};

template<>
class Prime_print<1> {   // full specialization to end the loop
public:
	enum { prim = 0 };
	void f() {}
};

#ifndef LAST 
#define LAST 18 
#endif

int main()
{
   Prime_print<LAST> a; 
   a.f(); 
}

Building blocks

Values

The 'variables' in TMP are not really variables since their values cannot be altered, but you can have named values that you use rather like you would variables in ordinary programming. When programming with types, named values are typedefs:

struct ValueHolder
{
   typedef int value;
};

You can think of this as 'storing' the int type so that it can be accessed under the value name. Integer values are usually stored as members in an enum:

struct ValueHolder
{
   enum { value = 2 };
};

This again stores the value so that it can be accessed under the name value. Neither of these examples is any use on its own, but they form the basis of most other TMP, so they are vital patterns to be aware of.

Functions

A function maps one or more input parameters into an output value. The TMP analogue to this is a template class:

template<int X, int Y>
struct Adder
{
   enum { result = X + Y };
};

This is a function that adds its two parameters and stores the result in the result enum member. You can call this at compile time with something like Adder<1, 2>::result, which will be expanded at compile time and act exactly like a literal 3 in your program.

Branching

A conditional branch can be constructed by writing two alternative specialisations of a template class. The compiler will choose the one that fits the types provided, and a value defined in the instantiated class can then be accessed. For example, consider the following partial specialisation:

template<typename X, typename Y>
struct SameType
{
   enum { result = 0 };
};

template<typename T>
struct SameType<T, T>
{
   enum { result = 1 };
};

This tells us if the two types it is instantiated with are the same. This might not seem very useful, but it can see through typedefs that might otherwise obscure whether types are the same, and it can be used on template arguments in template code. You can use it like this:

if (SameType<SomeThirdPartyType, int>::result)
{
   // ... Use some optimised code that can assume the type is an int
}
else
{
   // ... Use defensive code that doesn't make any assumptions about the type
}

The above code isn't very idiomatic: since the types can be identified at compile-time, the if() block will always have a trivial condition (it'll always resolve to either if (1) { ... } or if (0) { ... }). However, this does illustrate the kind of thing that can be achieved.

Recursion

Since you don't have mutable variables available when you're programming with templates, it's impossible to iterate over a sequence of values. Tasks that might be achieved with iteration in standard C++ have to be redefined in terms of recursion, i.e. a function that calls itself. This usually takes the shape of a template class whose output value recursively refers to itself, and one or more specialisations that give fixed values to prevent infinite recursion. You can think of this as a combination of the function and conditional branch ideas described above.

Calculating factorials is naturally done recursively: , and for , . In TMP, this corresponds to a class template "factorial" whose general form uses the recurrence relation, and a specialization of which terminates the recursion.

First, the general (unspecialized) template says that factorial<n>::value is given by n*factorial<n-1>::value:

template <unsigned n>
struct factorial
{
  enum { value = n * factorial<n-1>::value };
};

Next, the specialization for zero says that factorial<0>::value evaluates to 1:

template <>
struct factorial<0>
{
  enum { value = 1 };
};

And now some code that "calls" the factorial template at compile-time:

 
int main() {
  // Because calculations are done at compile-time, they can be
  // used for things such as array sizes.
  int array[ factorial<7>::value ];
}

Observe that the factorial<N>::value member is expressed in terms of the factorial<N> template, but this can't continue infinitely: each time it is evaluated, it calls itself with a progressively smaller (but non-negative) number. This must eventually hit zero, at which point the specialisation kicks in and evaluation doesn't recurse any further.

Example: Compile-time "If"

The following code defines a meta-function called "if_"; this is a class template that can be used to choose between two types based on a compile-time constant, as demonstrated in main below:

template <bool Condition, typename TrueResult, typename FalseResult>
class if_;

template <typename TrueResult, typename FalseResult>
struct if_<true, TrueResult, FalseResult>
{
  typedef TrueResult result;
};

template <typename TrueResult, typename FalseResult>
struct if_<false, TrueResult, FalseResult>
{
  typedef FalseResult result;
};

int main()
{
  typename if_<true, int, void*>::result number(3);
  typename if_<false, int, void*>::result pointer(&number);

   typedef typename if_<(sizeof(void *) > sizeof(uint32_t)), uint64_t, uint32_t>::result
      integral_ptr_t;
	  
   integral_ptr_t converted_pointer = reinterpret_cast<integral_ptr_t>(pointer);
}

On line 18, we evaluate the if_ template with a true value, so the type used is the first of the provided values. Thus the entire expression if_<true, int, void*>::result evaluates to int. Similarly, on line 19 the template code evaluates to void *. These expressions act exactly the same as if the types had been written as literal values in the source code.

Line 21 is where it starts to get clever: we define a type that depends on the value of a platform-dependent sizeof expression. On platforms where pointers are either 32 or 64 bits, this will choose the correct type at compile time without any modification, and without preprocessor macros. Once the type has been chosen, it can then be used like any other type.

Note:
This code is just an illustration of the power of template meta-programming, it is not meant to illustrate good cross-platform practice with pointers.

For comparison, this problem is best attacked in C90 as follows

# include <stddef.h>
typedef size_t integral_ptr_t;
typedef int the_correct_size_was_chosen [sizeof (integral_ptr_t) >= sizeof (void *)? 1: -1];

As it happens, the library-defined type size_t should be the correct choice for this particular problem on any platform. To ensure this, line 3 is used as a compile time check to see if the selected type is actually large enough; if not, the array type the_correct_size_was_chosen will be defined with a negative length, causing a compile-time error. In C99, <stdint.h> may define the types intptr_t and uintptr_t.

Debugging TMP

As of 2017, this cannot be done in any meaningful way. Generally it is easier to throw out the templates and start over than try to decipher the byzantine maze of compiler output that results from a single-byte typo in a template metaprogram.

Consider these observations from Herb Sutter, secretary for the C++ standardization committee:

   Herb: Boost.Lambda, is a marvel of engineering… and it worked very well if … if you spelled it exactly right the first time, and didn’t mind a 4-page error spew that told you almost nothing about what you did wrong if you spelled it a little wrong. …
   Charles: talk about giant error messages… you could have templates inside templates, you could have these error messages that make absolutely no sense at all.
   Herb: oh, they are baroque.

Source: https://ofekshilon.com/2012/09/01/meta-programming-is-still-evil/

Conventions for "Structured" TMP

Clipboard

To do:
Describe some conventions for "structured" TMP.

Standard Template Library (STL)

The Standard Template Library (STL), part of the C++ Standard Library, offers collections of algorithms, containers, iterators, and other fundamental components, implemented as templates, classes, and functions essential to extend functionality and standardization to C++. STL main focus is to provide improvements implementation standardization with emphasis in performance and correctness.

Instead of wondering if your array would ever need to hold 257 records or having nightmares of string buffer overflows, you can enjoy vector and string that automatically extend to contain more records or characters. For example, vector is just like an array, except that vector's size can expand to hold more cells or shrink when fewer will suffice. One must keep in mind that the STL does not conflict with OOP but in itself is not object oriented; In particular it makes no use of runtime polymorphism (i.e., has no virtual functions).

The true power of the STL lies not in its container classes, but in the fact that it is a framework, combining algorithms with data structures using indirection through iterators to allow generic implementations of higher order algorithms to work efficiently on varied forms of data. To give a simple example, the same std::copy function can be used to copy elements from one array to another, or to copy the bytes of a file, or to copy the whitespace-separated words in "text like this" into a container such as std::vector<std::string>.

 // std::copy from array a to array b
 int a[10] = { 3,1,4,1,5,9,2,6,5,4 };
 int b[10];
 std::copy(&a[0], &a[9], b);

 // std::copy from input stream a to an arbitrary OutputIterator
 template <typename OutputIterator>
 void f(std::istream &a, OutputIterator destination) {
   std::copy(std::istreambuf_iterator<char>(a),
             std::istreambuf_iterator<char>(),
             destination);
 }

 // std::copy from a buffer containing text, inserting items in
 // order at the back of the container called words.
 std::istringstream buffer("text like this");
 std::vector<std::string> words;
 std::copy(std::istream_iterator<std::string>(buffer),
           std::istream_iterator<std::string>(),
           std::back_inserter(words));
 assert(words[0] == "text");
 assert(words[1] == "like");
 assert(words[2] == "this");

History

Alexander Stepanov
Alexander Stepanov

The C++ Standard Library incorporated part of the STL (published as a software library by SGI/Hewlett-Packard Company). The primary implementer of the C++ Standard Template Library was Alexander Stepanov.

Today we call STL to what was adopted into the C++ Standard. The ISO C++ does not specify header content, and allows implementation of the STL either in the headers, or in a true library.

Note:
In an interview Alexander Stepanov, stated that he originally, wanted all auxiliary functions in STL to be visible but it was not politically possible, especially the heap functions. That Bjarne did reduce the number of components in STL by a factor of two as to permit the adoption into the standard.

Compilers will already have one implementation included as part of the C++ Standard (i.e., MS Visual Studio uses the Dinkum STL). All implementations will have to comply to the standard's requirements regarding functionality and behavior, but consistency of programs across all major hardware implementations, operating systems, and compilers will also depends on the portability of the STL implementation. They may also offer extended features or be optimized to distinct setups.

There are many different implementations of the STL, all based on the language standard but nevertheless differing from each other, making it transparent for the programmer, enabling specialization and rapid evolution of the code base. Several open source implementations are available, which can be useful to consult.

List of STL implementations.

Note:
There are advantages on having compartmentalized functionalities, some developers actively avoid using some of the language features, for a multitude of reasons. C++ permits the programmer to chose how to express themself, have control over the development paradigms and not be constricted by an higher level of abstraction.

Containers

The containers we will discuss in this section of the book are part of the standard namespace (std::). They all originated in the original SGI implementation of the STL.

Note:
When choosing a container, you should have in mind what makes them different, this will help you produce more efficient code. See also the Optimization Section of the book, about using the right data in the right container.

Sequence Containers

Sequences - easier than arrays

Sequences are similar to C arrays, but they are easier to use. Vector is usually the first sequence to be learned. Other sequences, list and double-ended queues, are similar to vector but more efficient in some special cases. (Their behavior is also different in important ways concerning validity of iterators when the container is changed; iterator validity is an important, though somewhat advanced, concept when using containers in C++.)

  • vector - "an easy-to-use array"
  • list - in effect, a doubly-linked list
  • deque - double-ended queue (properly pronounced "deck", often mispronounced as "dee-queue")
vector

The vector is a template class in itself, it is a Sequence Container and allows you to easily create a dynamic array of elements (one type per instance) of almost any data-type or object within a programs when using it. The vector class handles most of the memory management for you.

Since a vector contain contiguous elements it is an ideal choice to replace the old C style array, in a situation where you need to store data, and ideal in a situation where you need to store dynamic data as an array that changes in size during the program's execution (old C style arrays can't do it). However, vectors do incur a very small overhead compared to static arrays (depending on the quality of your compiler), and cannot be initialized through an initialization list.

Note:

Vector is known to be slow when using the MSVC compiler due to the SECURE_SCL flag, that, by default, forces bounds checking even in optimized builds.

Accessing members of a vector or appending elements takes a fixed amount of time, no matter how large the vector is, whereas locating a specific value in a vector element or inserting elements into the vector takes an amount of time directly proportional to its location in it (size dependent).

Note:

If you create a vector you can access its data using consecutive pointers:

  std::vector<type> myvector(8);
  type * ptr = &myvector[0];
  ptr[0], ptr[7]; // access the first and last objects in myvector

this information is present in INCITS/ISO/IEC 14882-2003 but was not properly documented in the 1998 version of the C++ standard.
Be aware that ptr[i] is faster than myvector.at(i) because no error checking is performed. Watch out for how long that pointer is valid. The contiguous nature of vectors is most often important when interfacing to C code.

You should also keep in mind that std::vector<T>::iterator may not be a pointer; using an iterator is the safest mode to access a container but safety has always a cost in performance.

Example
Clipboard

To do:
Should this be split into 2 examples, a "old C style array" example and a "new C++ STL vector" example?


/*
David Cary 2009-03-04
quick demo for wikibooks
*/

#include <iostream>
#include <vector>
using namespace std;

vector<int> pick_vector_with_biggest_fifth_element(vector<int> left,vector<int> right)
{
    if(left[5] < right[5])
    {
        return( right );
    }
    // else
    return left ;
}

int* pick_array_with_biggest_fifth_element(int * left,int * right)
{
    if(left[5] < right[5])
    {
        return( right );
    }
    // else 
    return left ;
}

int vector_demo(void)
{
    cout << "vector demo" << endl;
    vector<int> left(7);
    vector<int> right(7);

    left[5] = 7;
    right[5] = 8;
    cout << left[5] << endl;
    cout << right[5] << endl;
    vector<int> biggest(pick_vector_with_biggest_fifth_element( left, right ) );
    cout << biggest[5] << endl;

    return 0;
}

int array_demo(void)
{
    cout << "array demo" << endl;
    int left[7];
    int right[7];

    left[5] = 7;
    right[5] = 8;
    cout << left[5] << endl;
    cout << right[5] << endl;
    int * biggest =
        pick_array_with_biggest_fifth_element( left, right );
    cout << biggest[5] << endl;

    return 0;
}

int main(void)
{
    vector_demo();
    array_demo();
}
Member Functions

The vector class models the Container concept, which means it has begin(), end(), size(), max_size(), empty(), and swap() methods.

Note:
Since most vector (or deque) implementations typically reserves some extra internal storage for future growth. Prefer the swap() method when altering a standard vector size (or freeing the memory used) when memory resources becomes a factor.

  • informative
    • vector::front - Returns reference to first element of vector.
    • vector::back - Returns reference to last element of vector.
    • vector::size - Returns number of elements in the vector.
    • vector::empty - Returns true if vector has no elements.
  • standard operations
    • vector::insert - Inserts elements into a vector (single & range), shifts later elements up. Inefficient.
    • vector::push_back - Appends (inserts) an element to the end of a vector, allocating memory for it if necessary. Amortized O(1) time.
    • vector::erase - Deletes elements from a vector (single & range), shifts later elements down. Inefficient.
    • vector::pop_back - Erases the last element of the vector, (possibly reducing capacity - usually it isn't reduced, but this depends on particular STL implementation). Amortized O(1) time.
    • vector::clear - Erases all of the elements. Note however that if the data elements are pointers to memory that was created dynamically (e.g., the new operator was used), the memory will not be freed.
  • allocation/size modification
    • vector::assign - Used to delete a origin vector and copies the specified elements to an empty target vector.
    • vector::reserve - Changes capacity (allocates more memory) of vector, if needed. In many STL implementations capacity can only grow, and is never reduced.
    • vector::capacity - Returns current capacity (allocated memory) of vector.
    • vector::resize - Changes the vector size.
  • iteration
    • vector::begin - Returns an iterator to start traversal of the vector.
    • vector::end - Returns an iterator that points just beyond the end of the vector.
    • vector::at - Returns a reference to the data element at the specified location in the vector, with bounds checking.

Note:

It is important to remember the distinctions of capacity(), size() and empty() when dealing with containers.

vector<int> v;
for (vector<int>::iterator it = v.begin(); it!=v.end(); ++it/* increment operand is used to move to next element*/) {
    cout << *it << endl;
}
vector::Iterators

std::vector<T> provides Random Access Iterators; as with all containers, the primary access to iterators is via begin() and end() member functions. These are overloaded for const- and non-const containers, returning iterators of types std::vector<T>::const_iterator and std::vector<T>::iterator respectively.


Clipboard

To do:
Add missing data


vector examples
 /* Vector sort example */
 #include <iostream>
 #include <vector>
 
 int main()
 {
         using namespace std;
  
         cout << "Sorting STL vector, \"the easier array\"... " << endl;
         cout << "Enter numbers, one per line.  Press ctrl-D to quit." << endl;
 
         vector<int> vec; 
         int tmp;
         while (cin>>tmp) {
                 vec.push_back(tmp);
         }
 
         cout << "Sorted: " << endl;
         sort(vec.begin(), vec.end());   
         int i = 0;
         for (i=0; i<vec.size(); i++) {
                 cout << vec[i] << endl;;
         }
 
         return 0;
 }

The call to sort above actually calls an instantiation of the function template std::sort, which will work on any half-open range specified by two random access iterators.

If you like to make the code above more "STLish" you can write this program in the following way:

 #include <iostream>
 #include <vector>
 #include <algorithm>
 #include <iterator>
 
 int main()
 {
        using namespace std;
 
        cout << "Sorting STL vector, \"the easier array\"... " << endl;
        cout << "Enter numbers, one per line.  Press ctrl-D to quit." << endl;
 
        istream_iterator<int> first(cin);
        istream_iterator<int> last;
        vector<int> vec(first, last);
 
        sort(vec.begin(), vec.end());
 
        cout << "Sorted: " << endl;
 
        copy(vec.begin(), vec.end(), ostream_iterator<int>(cout, "\n"));
 
        return 0;
 }
Linked lists

The STL provides a class template called list (part of the standard namespace (std::)) which implements a non-intrusive doubly-linked list. Linked lists can insert or remove elements in the middle in constant time, but do not have random access. One useful feature of std::list is that references, pointers and iterators to items inserted into a list remain valid so long as that item remains in the list.

Note:
Consider using vector instead of list for better cache coherency and avoid "death by swapping", see the Optimization Section, about using the right data in the right container.

list examples
 /* List example - insertion in a list */
 #include <iostream>
 #include <algorithm>
 #include <iterator>
 #include <list>

 void print_list(std::list<int> const& a_filled_list)
 {
        using namespace std;

        ostream_iterator<int> out(cout, "  ");
        copy(a_filled_list.begin(), a_filled_list.end(), out);
 }

 int main()
 {
	 std::list<int> my_list;

	 my_list.push_back(1);
	 my_list.push_back(10);
	 print_list(my_list); //print : 1 10

	 std::cout << std::endl;

	 my_list.push_front(45);
	 print_list(my_list); //print : 45 1 10

	 return 0;
 }
Clipboard

To do:
Add missing data


Associative Containers (key and value)

This type of container point to each element in the container with a key value, thus simplifying searching containers for the programmer. Instead of iterating through an array or vector element by element to find a specific one, you can simply ask for people["tero"]. Just like vectors and other containers, associative containers can expand to hold any number of elements.

Maps and Multimaps

map and multimap are associative containers that manage key/value pairs as elements as seen above. The elements of each container will sort automatically using the actual key for sorting criterion. The difference between the two is that maps do not allow duplicates, whereas, multimaps does.

  • map - unique keys
  • multimap - same key can be used many times
  • set - unique key is the value
  • multiset - key is the value, same key can be used many times
  /* Map example - character distribution  */
  #include <iostream>
  #include <map>
  #include <string>
  #include <cctype>
 
  using namespace std;
 
  int main()
  {
          /* Character counts are stored in a map, so that 
           * character is the key.
           * Count of char a is chars['a']. */
          map<char, long> chars;
 
          cout << "chardist - Count character distributions" << endl;
          cout << "Type some text. Press ctrl-D to quit." << endl;
          char c;
          while (cin.get(c)) {
                  // Upper A and lower a are considered the same 
                  c=tolower(static_cast<unsigned char>(c));
                  chars[c]=chars[c]+1; // Could be written as ++chars[c];
          }
 
          cout << "Character distribution: " << endl;
            
          string alphabet("abcdefghijklmnopqrstuvwxyz");
          for (string::iterator letter_index=alphabet.begin(); letter_index != alphabet.end(); letter_index++) {
                  if (chars[*letter_index] != 0) {
                          cout << char(toupper(*letter_index))
                               << ":" << chars[*letter_index]
                               << "\t" << endl; 
                  }
          }
          return 0;
  }

Container Adapters

  • stack - last in, first out (LIFO)
  • queue - first in, first out (FIFO)
  • priority queue

Iterators

C++'s iterators are one of the foundation of the STL. Iterators exist in languages other than C++, but C++ uses an unusual form of iterators, with pros and cons.

In C++, an iterator is a concept rather than a specific type, they are a generalization of the pointers as an abstraction for the use of containers. Iterators are further divided based on properties such as traversal properties.

The basic idea of an iterator is to provide a way to navigate over some collection of objects concept.

Some (overlapping) categories of iterators are:

  • Singular iterators
  • Invalid iterators
  • Random access iterators
  • Bidirectional iterators
  • Forward iterators
  • Input iterators
  • Output iterators
  • Mutable iterators

A pair of iterators [begin, end) is used to define a half open range, which includes the element identified from begin to end, except for the element identified by end. As a special case, the half open range [x, x) is empty, for any valid iterator x.

Note:
The range notation may vary, the meaning is to express the inclusion or exclusion of the range limits. An also common notation is [begin, end[ (meaning begin is part of the range and end is not).

The most primitive examples of iterators in C++ (and likely the inspiration for their syntax) are the built-in pointers, which are commonly used to iterate over elements within arrays.

Iteration over a Container

Accessing (but not modifying) each element of a container group of type C<T> using an iterator.

 for (
      typename C<T>::const_iterator iter = group.begin();
      iter != group.end();
      ++iter
     )
 {
     T const &element = *iter;
 
     // access element here
 }

Note the usage of typename. It informs the compiler that 'const_iterator' is a type as opposed to a static member variable. (It is only necessary inside templated code, and indeed in C++98 is invalid in regular, non-template, code. This may change in the next revision of the C++ standard so that the typename above is always permitted.)

Modifying each element of a container group of type C<T> using an iterator.

 for (
      typename C<T>::iterator iter = group.begin();
      iter != group.end();
      ++iter
     )
 {
     T &element = *iter;
 
     // modify element here
 }

When modifying the container itself while iterating over it, some containers (such as vector) require care that the iterator doesn't become invalidated, and end up pointing to an invalid element. For example, instead of:

  for (i = v.begin(); i != v.end(); ++i) {
    ...
    if (erase_required) {
      v.erase(i);
    }
  }

Do:

  for (i = v.begin(); i != v.end(); ) {
    ...
    if (erase_required) {
        i = v.erase(i);
    } else {
        ++i;
    }
  }

The erase() member function returns the next valid iterator, or end(), thus ending the loop. Note that ++i is not executed when erase() has been called on an element.

Functors

A functor or function object, is an object that has an operator (). The importance of functors is that they can be used in many contexts in which C++ functions can be used, whilst also having the ability to maintain state information. Next to iterators, functors are one of the most fundamental ideas exploited by the STL.

The STL provides a number of pre-built functor classes; std::less, for example, is often used to specify a default comparison function for algorithms that need to determine which of two objects comes "before" the other.

 #include <vector>
 #include <algorithm>
 #include <iostream>

 // Define the Functor for AccumulateSquareValues
 template<typename T>
 struct AccumulateSquareValues
 {
     AccumulateSquareValues() : sumOfSquares()
     {
     }
     void operator()(const T& value)
     {
         sumOfSquares += value*value;
     }
     T Result() const
     {
         return sumOfSquares;
     }
     T sumOfSquares;
 };

 std::vector<int> intVec;
 intVec.reserve(10);
 for( int idx = 0; idx < 10; ++idx )
 {
     intVec.push_back(idx);
 }
 AccumulateSquareValues<int> sumOfSquare = std::for_each(intVec.begin(), 
                                                         intVec.end(), 
                                                         AccumulateSquareValues<int>() );
 std::cout << "The sum of squares for 1-10 is " << sumOfSquare.Result() << std::endl;

 // note: this problem can be solved in another, more clear way:
 // int sum_of_squares = std::inner_product(intVec.begin(), intVec.end(), intVec.begin(), 0);

Algorithms

The STL also provides several useful algorithms, in the form of template functions, that are provided to, with the help of the iterator concept, manipulate the STL containers (or derivations).

The STL algorithms aren't restricted to STL containers, for instance:

#include <algorithm>

  int array[10] = { 2,3,4,5,6,7,1,9,8,0 };

  int* begin = &array[0];
  int* end = &array[0] + 10;

  std::sort(begin, end);// the sort algorithm will work on a C style array
The _if suffix
The _copy suffix
  • Non-modifying algorithms
  • Modifying algorithms
  • Removing algorithms
  • Mutating algorithms
  • Sorting algorithms
  • Sorted range algorithms
  • Numeric algorithms

Permutations

Clipboard

To do:
Complete


Sorting and related operations

sort
stable_sort
partial_sort
Minimum and maximum

The standard library provides function templates min and max, which return the minimum and maximum of their two arguments respectively. Each has an overload available that allows you to customize the way the values are compared.

template<class T>
const T& min(const T& a, const T& b);

template<class T, class Compare>
const T& min(const T& a, const T& b, Compare c);

template<class T>
const T& max(const T& a, const T& b);

template<class T, class Compare>
const T& max(const T& a, const T& b, Compare c);

An example of how to use the Compare type parameter :

 #include <iostream>
 #include <algorithm>
 #include <string>

 class Account
 {
	 private :
         std::string owner_name;
	 int credit;
	 int potential_credit_transfer;

	 public :
	 Account(){}
	 Account(std::string name, int initial_credit, int initial_credit_transfer) :
	 	 owner_name(name),
	         credit(initial_credit),
	         potential_credit_transfer(initial_credit_transfer)
	 {}

	 bool operator<(Account const& account) const { return credit < account.credit; }

	 int potential_credit() const { return credit + potential_credit_transfer; }

	 std::string const& owner() const { return owner_name; }
 };

 struct CompareAccountCredit
 {
	 bool operator()(Account const& account1, Account const& account2) const 
         { return account1 < account2; }
 };

 struct CompareAccountPotentialCredit
 {
	 bool operator()(Account const& account1, Account const& account2) const 
         { return account1.potential_credit() < account2.potential_credit(); }
 };

 int main()
 {
	 Account account1("Dennis Ritchie", 1000, 250), account2("Steeve Jobs", 500, 10000), 
         result_comparison;

	 result_comparison = std::min(account1, account2, CompareAccountCredit());
	 std::cout << "min credit of account is : " + result_comparison.owner() << std::endl;

	 result_comparison = std::min(account1, account2, CompareAccountPotentialCredit());
	 std::cout << "min potential credit of account is : " + result_comparison.owner() << std::endl;

	 return 0;
 }
Clipboard

To do:
This needs an example of how to use the Compare type parameter

Allocators

Allocators are used by the Standard C++ Library (and particularly by the STL) to allow parameterization of memory allocation strategies.

The subject of allocators is somewhat obscure, and can safely be ignored by most application software developers. All standard library constructs that allow for specification of an allocator have a default allocator which is used if none is given by the user.

Custom allocators can be useful if the memory use of a piece of code is unusual in a way that leads to performance problems if used with the general-purpose default allocator. There are also other cases in which the default allocator is inappropriate, such as when using standard containers within an implementation of replacements for global operators new and delete.

Smart Pointers

Using raw pointers to store allocated data and then cleaning them up in the destructor can generally be considered a very bad idea since it is error-prone. Even temporarily storing allocated data in a raw pointer and then deleting it when done with it should be avoided for this reason. For example, if your code throws an exception, it can be cumbersome to properly catch the exception and delete all allocated objects.

Smart pointers can alleviate this headache by using the compiler and language semantics to ensure the pointer content is automatically released when the pointer itself goes out of scope.

#include <memory>
class A
{
public:
        virtual ~A() {}
	virtual char val() = 0;
};

class B : public A
{
public:
	virtual char val() { return 'B'; }
};

A* get_a_new_b()
{
	return new B();
}

bool some_func()
{
	bool rval = true;
	std::auto_ptr<A> a( get_a_new_b() );
	try {
		std::cout << a->val();
	} catch(...) {
		if( !a.get() ) {
			throw "Memory allocation failure!";
		}
		rval = false;
	}
	return rval;
}


Clipboard

To do:
Could note that the rebind pattern used by allocators is an alternative to using a template template parameter. Historically the STL was largely developed before C++ compilers offered support for template template parameters. Interestingly, modern template metaprogramming style has promoted a rebind-like approach instead of using template template parameters.


Semantics

The auto_ptr has semantics of strict ownership, meaning that the auto_ptr instance is the sole entity responsible for the object's lifetime. If an auto_ptr is copied, the source loses the reference. For example:

#include <iostream>
#include <memory>
using namespace std;
 
int main(int argc, char **arv)
{
    int *i = new int;
    auto_ptr<int> x(i);
    auto_ptr<int> y;
    
    y = x;
    
    cout << x.get() << endl;
    cout << y.get() << endl;
}

This code will print a NULL address for the first auto_ptr object and some non-NULL address for the second, showing that the source object lost the reference during the assignment (=). The raw pointer i in the example should not be deleted, as it will be deleted by the auto_ptr that owns the reference. In fact, new int could be passed directly into x, eliminating the need for i.

Notice that the object pointed by an auto_ptr is destructed using operator delete; this means that you should only use auto_ptr for pointers obtained with operator new. This excludes pointers returned by malloc(), calloc() or realloc() and operator new[].

Exception Handling

Exception handling is a construct designed to handle the occurrence of exceptions, that is special conditions that changes the normal flow of program execution. Since when designing a programming task (a class or even a function), one cannot always assume that application/task will run or be completed correctly (exit with the result it was intended to). It may be the case that it will be just inappropriate for that given task to report an error message (return an error code) or just exit. To handle these types of cases, C++ supports the use of language constructs to separate error handling and reporting code from ordinary code, that is, constructs that can deal with these exceptions (errors and abnormalities) and so we call this global approach that adds uniformity to program design the exception handling.

An exception is said to be thrown at the place where some error or abnormal condition is detected. The throwing will cause the normal program flow to be aborted, in a raised exception. An exception is thrown programmatic, the programmer specifies the conditions of a throw.

In handled exceptions, execution of the program will resume at a designated block of code, called a catch block, which encloses the point of throwing in terms of program execution. The catch block can be, and usually is, located in a different function/method than the point of throwing. In this way, C++ supports non-local error handling. Along with altering the program flow, throwing of an exception passes an object to the catch block. This object can provide data that is necessary for the handling code to decide in which way it should react on the exception.

Consider this next code example of a try and catch block combination for clarification:

void AFunction()
{
    // This function does not return normally, 
    // instead execution will resume at a catch block.
    // The thrown object is in this case of the type char const*,
    // i.e. it is a C-style string. More usually, exception
    // objects are of class type.
    throw "This is an exception!"; 
}

void AnotherFunction()
{
    // To catch exceptions, you first have to introduce
    // a try block via " try { ... } ". Then multiple catch
    // blocks can follow the try block.
    // " try { ... } catch(type 1) { ... } catch(type 2) { ... }"
    try 
    {
        AFunction();
       // Because the function throws an exception,
       // the rest of the code in this block will not
       // be executed
    }
    catch(char const* pch)  // This catch block 
                            // will react on exceptions
                            // of type char const*
    {
        // Execution will resume here.
        // You can handle the exception here.
    }
               // As can be seen
    catch(...) // The ellipsis indicates that this
               // block will catch exceptions of any type. 
    {
       // In this example, this block will not be executed,
       // because the preceding catch block is chosen to 
       // handle the exception.
    }
}

Unhandled exceptions on the other hand will result in a function termination and the stack will be unwound (stack allocated objects will have destructors called) as it looks for an exception handler. If none is found it will ultimately result in the termination of the program.

From the point of view of a programmer, raising an exception is a useful way to signal that a routine could not execute normally. For example, when an input argument is invalid (e.g. a zero denominator in division) or when a resource it relies on is unavailable (like a missing file, or a hard disk error). In systems without exceptions, routines would need to return some special error code. However, this is sometimes complicated by the semi-predicate problem, in which users of the routine need to write extra code to distinguish normal return values from erroneous ones.

Because it is hard to write exception safe code, you should only use an exception when you have to—when an error has occurred that you can not handle. Do not use exceptions for the normal flow of the program.

This example is wrong, it is a demonstration on what to avoid:

void sum(int iA, int iB)
{
    throw iA + iB;
}

int main()
{
    int iResult;

    try 
    {
        sum(2, 3);
    }
    catch(int iTmpResult)  
    {
        // Here the exception is used instead of a return value!
        // This is  wrong!
        iResult = iTmpResult;
    }

    return 0;
}

Stack unwinding

Consider the following code

void g()
{ 
    throw std::exception();
}
 
void f()
{
    std::string str = "Hello"; // This string is newly allocated
    g();
}
 
int main()
{
    try
    {
        f();
    }
    catch(...) 
    { }
}

The flow of the program:

  • main() calls f()
  • f() creates a local variable named str
  • str constructor allocates a memory chunk to hold the string "Hello"
  • f() calls g()
  • g()throws an exception
  • f() does not catch the exception.
Because the exception was not caught, we now need to exit f() in a clean fashion.
At this point, all the destructors of local variables previous to the throw
are called—This is called 'stack unwinding'.
  • The destructor of str is called, which releases the memory occupied by it.
As you can see, the mechanism of 'stack unwinding' is essential to prevent resource leaks—without it, str would never be destroyed, and the memory it used would be lost until the end of the program (even until the next loss of power, or cold boot depending on the Operative System memory management).
  • main() catches the exception
  • The program continues.

The 'stack unwinding' guarantees destructors of local variables (stack variables) will be called when we leave its scope.

Throwing objects

There are several ways to throw an exception object.

Throw a pointer to the object:

void foo()
{
    throw new MyApplicationException();
}

void bar()
{
    try 
    {
        foo();
    }
    catch(MyApplicationException* e)
    {
        // Handle exception
    }
}

But now, who is responsible to delete the exception? The handler? This makes code uglier. There must be a better way!

How about this:

void foo()
{
    throw MyApplicationException();
}

void bar()
{
    try 
    {
        foo();
    }
    catch(MyApplicationException e)
    {
        // Handle exception
    }
}

Looks better! But now, the catch handler that catches the exception, does it by value, meaning that a copy constructor is called. This can cause the program to crash if the exception caught was a bad_alloc caused by insufficient memory. In such a situation, seemingly safe code that is assumed to handle memory allocation problems results in the program crashing with a failure of the exception handler. Moreover, catching by value may cause the copy to have different behavior because of object slicing.

The correct approach is:

void foo()
{
    throw MyApplicationException();
}

void bar()
{
    try 
    {
        foo();
    }
    catch(MyApplicationException const& e)
    {
        // Handle exception
    }
}

This method has all the advantages—the compiler is responsible for destroying the object, and no copying is done at catch time!

The conclusion is that exceptions should be thrown by value, and caught by (usually const) reference.

The finally keyword

Consider the code snippet:

try
{
void x()
{
throw m();
}
}
catch n();
{
std:cout << "Exception caught\n";
}

When this code is executed, the program will look for a catchblock for the exception, but it does not exist. Hence it will crash and not proceed.
The finallykeyword allows some final code to execute before crashing.

finally
{
// residual code that will execute in any case
}

Note that if the exception is caught, the lines after the try-catch block will execute anyway. Hence the finallyblock will be of effect only when there is no matching catch block.

Constructors and destructors

When an exception is thrown from a constructor, the object is not considered instantiated, and therefore its destructor will not be called. But all destructors of already successfully constructed base and member objects of the same master object will be called. Destructors of not yet constructed base or member objects of the same master object will not be executed. Example:

class A : public B, public C
{
public:
    D sD;
    E sE;
    A(void)
    :B(), C(), sD(), sE()
    {
    }
};

Let's assume the constructor of base class C throws. Then the order of execution is:

  • B
  • C (throws)
  • ~B

Let's assume the constructor of member object sE throws. Then the order of execution is:

  • B
  • C
  • sD
  • sE (throws)
  • ~sD
  • ~C
  • ~B

Thus if some constructor is executed, one can rely on that all other constructors of the same master object executed before, were successful. This enables one, to use an already constructed member or base object as an argument for the constructor of one of the following member or base objects of the same master object.

What happens when we allocate this object with new?

  • Memory for the object is allocated
  • The object's constructor throws an exception
    • The object was not instantiated due to the exception
  • The memory occupied by the object is deleted
  • The exception is propagated, until it is caught

The main purpose of throwing an exception from a constructor is to inform the program/user that the creation and initialization of the object did not finish correctly. This is a very clean way of providing this important information, as constructors do not return a separate value containing some error code (as an initialization function might).

In contrast, it is strongly recommended not to throw exceptions inside a destructor. It is important to note when a destructor is called:

  • as part of a normal deallocation (exit from a scope, delete)
  • as part of a stack unwinding that handles a previously thrown exception.

In the former case, throwing an exception inside a destructor can simply cause memory leaks due to incorrectly deallocated object. In the latter, the code must be more clever. If an exception was thrown as part of the stack unwinding caused by another exception, there is no way to choose which exception to handle first. This is interpreted as a failure of the exception handling mechanism and that causes the program to call the function terminate.

To address this problem, it is possible to test if the destructor was called as part of an exception handling process. To this end, one should use the standard library function uncaught_exception, which returns true if an exception has been thrown, but hasn't been caught yet. All code executed in such a situation must not throw another exception.

Situations where such careful coding is necessary are extremely rare. It is far safer and easier to debug if the code was written in such a way that destructors did not throw exceptions at all.

Writing exception safe code

Exception safety

A piece of code is said to be exception-safe, if run-time failures within the code will not produce ill effects, such as memory leaks, garbled stored data, or invalid output. Exception-safe code must satisfy invariants placed on the code even if exceptions occur. There are several levels of exception safety:

  1. Failure transparency, also known as the no throw guarantee: Operations are guaranteed to succeed and satisfy all requirements even in presence of exceptional situations. If an exception occurs, it will not throw the exception further up. (Best level of exception safety.)
  2. Commit or rollback semantics, also known as strong exception safety or no-change guarantee: Operations can fail, but failed operations are guaranteed to have no side effects so all data retain original values.
  3. Basic exception safety: Partial execution of failed operations can cause side effects, but invariants on the state are preserved. Any stored data will contain valid values even if data has different values now from before the exception.
  4. Minimal exception safety also known as no-leak guarantee: Partial execution of failed operations may store invalid data but will not cause a crash, and no resources get leaked.
  5. No exception safety: No guarantees are made. (Worst level of exception safety)

Partial handling

Consider the following case:

void g()
{
    throw "Exception";
}
  
void f()
{
    int* pI = new int(0);
    g();
    delete pI;
}

int main()
{
    f();
    return 0;
}

Can you see the problem in this code? If g() throws an exception, the variable pI is never deleted and we have a memory leak.

To prevent the memory leak, f() must catch the exception, and delete pI. But f() can't handle the exception, it doesn't know how!

What is the solution then? f() shall catch the exception, and then re-throw it:

void g()
{
    throw "Exception";
}
  
void f()
{
    int* pI = new int(0);

    try
    {
        g();
    }
    catch (...)
    {
        delete pI;
        throw; // This empty throw re-throws the exception we caught
               // An empty throw can only exist in a catch block
    }

    delete pI;
}

int main()
{
    f();
    return 0;
}

There's a better way though; using RAII classes to avoid the need to use exception handling.

Guards

If you plan to use exceptions in your code, you must always try to write your code in an exception safe manner. Let's see some of the problems that can occur:

Consider the following code:

void g()
{ 
    throw std::exception();
}
 
void f()
{
    int* pI = new int(2);

    *pI = 3;
    g();
    // Oops, if an exception is thrown, pI is never deleted
    // and we have a memory leak
    delete pI;
}
 
int main()
{
    try
    {
        f();
    } 
    catch(...) 
    { }

    return 0;
}

Can you see the problem in this code? When an exception is thrown, we will never run the line that deletes pI!

What's the solution to this? Earlier we saw a solution based on f() ability to catch and re-throw. But there is a neater solution using the 'stack unwinding' mechanism. But 'stack unwinding' only applies to destructors for objects, so how can we use it?

We can write a simple wrapper class:

 // Note: This type of class is best implemented using templates, discussed in the next chapter.
 class IntDeleter {
 public:
    IntDeleter(int* piValue)
    {
        m_piValue = piValue;
    }
    
    ~IntDeleter() 
    {
        delete m_piValue;
    }
  
    // operator *, enables us to dereference the object and use it
    // like a regular pointer.
    int&  operator *() 
    {
        return *m_piValue;
    }

 private:
     int* m_piValue;
 };

The new version of f():

 void f()
 {
   IntDeleter pI(new int(2));

   *pI = 3;
   g();
   // No need to delete pI, this will be done in destruction.
   // This code is also exception safe.
 }

The pattern presented here is called a guard. A guard is very useful in other cases, and it can also help us make our code more exception safe. The guard pattern is similar to a finally block in other languages.

Note that the C++ Standard Library provides a templated guard by the name of unique_ptr.

Exception hierarchy

You may throw as exception an object (like a class or string), a pointer (like char*), or a primitive (like int). So, which should you choose? You should throw objects, as they ease the handling of exceptions for the programmer. It is common to create a class hierarchy of exception classes:

  • class MyApplicationException {};
    • class MathematicalException : public MyApplicationException {};
      • class DivisionByZeroException : public MathematicalException {};
    • class InvalidArgumentException : public MyApplicationException {};

An example:

float divide(float fNumerator, float fDenominator)
{
    if (fDenominator == 0.0)
    {
        throw DivisionByZeroException();
    }

    return fNumerator/fDenominator;
}

enum MathOperators {DIVISION, PRODUCT};

float operate(int iAction, float fArgLeft, float fArgRight)
{ 
    if (iAction == DIVISION)
    {
        return divide(fArgLeft, fArgRight);
    }
    else if (iAction == PRODUCT))
    {
        // call the product function
        // ... 
    }

    // No match for the action! iAction is an invalid agument
    throw InvalidArgumentException(); 
}
 
int main(int iArgc, char* a_pchArgv[])
{
    try
    {
        operate(atoi(a_pchArgv[0]), atof(a_pchArgv[1]), atof(a_pchArgv[2]));
    } 
    catch(MathematicalException& )
    {
        // Handle Error
    }
    catch(MyApplicationException& )
    {
        // This will catch in InvalidArgumentException too.
        // Display help to the user, and explain about the arguments.
    }

    return 0;
}

Note:
The order of the catch blocks is important. A thrown object (say, InvalidArgumentException) can be caught in a catch block of one of its super-classes. (e.g. catch (MyApplicationException& ) will catch it too). This is why it is important to place the catch blocks of derived classes before the catch block of their super classes.

Exception specifications

Note:
The use of exception specifications has been deprecated in the new standard C++11. It is recommended that nobody use them. They are included here for historical reasons (not everybody is using C++11 yet)

The range of exceptions that can be thrown by a function are an important part of that function's public interface. Without this information, you would have to assume that any exception could occur when calling any function, and consequently write code that was extremely defensive. Knowing the list of exceptions that can be thrown, you can simplify your code since it doesn't need to handle every case.

This exception information is specifically part of the public interface. Users of a class don't need to know anything about the way it is implemented, but they do need to know about the exceptions that can be thrown, just as they need to know the number and type of parameters to a member function. One way of providing this information to clients of a library is via code documentation, but this needs to be manually updated very carefully. Incorrect exception information is worse than none at all, since you may end up writing code that is less exception-safe than you intended to.

C++ provides another way of recording the exception interface, by means of exception specifications. An exception specification is parsed by the compiler, which provides a measure of automated checking. An exception specification can be applied to any function, and looks like this:

double divide(double dNumerator, double dDenominator) throw (DivideByZeroException);

You can specify that a function cannot throw any exceptions by using an empty exception specification:

void safeFunction(int iFoo) throw();

Shortcomings of exception specifications

C++ does not programmatically enforce exception specifications at compile time. For example, the following code is legal:

void DubiousFunction(int iFoo) throw()
{
    if (iFoo < 0)
    {
        throw RangeException();
    }
}

Rather than checking exception specifications at compile time, C++ checks them at run time, which means that you might not realize that you have an inaccurate exception specification until testing or, if you are unlucky, when the code is already in production.

If an exception is thrown at run time that propagates out of a function that doesn't allow the exception in its exception specification, the exception will not propagate any further and instead, the function RangeException() will be called. The RangeException() function doesn't return, but can throw a different type of exception that may (or may not) satisfy the exception specification and allow exception handling to carry on normally. If this still doesn't recover the situation, the program will be terminated.

Many people regard the behavior of attempting to translate exceptions at run time to be worse than simply allowing the exception to propagate up the stack to a caller who may be able to handle it. The fact that the exception specification has been violated does not mean that the caller can't handle the situation, only that the author of the code didn't expect it. Often there will be a catch (...) block somewhere on the stack that can deal with any exception.

Note:
Some coding standards require that exception specifications are not used. In the C++ language standard C++11(C++0x), the use of exception specifications as specified in the current version of the standard (C++03), is deprecated. The use of exception specifications has been deprecated entirely, meaning that it is highly recommended that nobody use them.

Noexcept specifiers

Clipboard

To do:
Add C++11 noexcept specifiers including the noexcept operator.

Run-Time Type Information (RTTI)

RTTI refers to the ability of the system to report on the dynamic type of an object and to provide information about that type at runtime (as opposed to at compile time), and when utilized consistently can be a powerful tool to ease the work of the programmer in managing resources.

dynamic_cast

Consider what you have already learned about the dynamic_cast keyword, and let's say that we have the following class hierarchy:

class Interface
{
public:
        virtual void GenericOp() = 0;// pure virtual function
};

class SpecificClass : public Interface
{
public:
        virtual void GenericOp();
        virtual void SpecificOp();
};

Let's say that we also have a pointer of type Interface*, like so:

Interface* ptr_interface;

Supposing that a situation emerges that we are forced to presume but have no guarantee that the pointer points to an object of type SpecificClass and we would like to call the member SpecificOp() of that class. To dynamically convert to a derived type we can use dynamic_cast, like so:

SpecificClass* ptr_specific = dynamic_cast<SpecificClass*>(ptr_interface);
if( ptr_specific ){
    // our suspicions are confirmed -- it really was a SpecificClass
    ptr_specific->SpecificOp();
}else{
    // our suspicions were incorrect -- it is definitely not a SpecificClass.
    // The ptr_interface points to an instance of some other child class of the base InterfaceClass.
    ptr_interface->GenericOp();
};

With dynamic_cast, the program converts the base class pointer to a derived class pointer and allows the derived class members to be called. Be very careful, however: if the pointer that you are trying to cast is not of the correct type, then dynamic_cast will return a null pointer.

We can also use dynamic_cast with references.

SpecificClass& ref_specific = dynamic_cast<SpecificClass&>(ref_interface);

This works almost in the same way as pointers. However, if the real type of the object being cast is not correct then dynamic_cast will not return null (there's no such thing as a null reference). Instead, it will throw a std::bad_cast exception.

typeid

Syntax
    typeid( object );

The typeid operator is used to determine the class of an object at runtime. It returns a reference to a std::type_info object, which exists until the end of the program, that describes the "object". If the "object" is a dereferenced null pointer, then the operation will throw a std::bad_typeid exception.

Objects of class std::bad_typeid are derived from std::exception, and thrown by typeid and others.

Note:
The C++98 standard requires that header file <typeinfo> be included before operator typeid is used within a compilation unit. Otherwise, the program is considered ill-formed.

The use of typeid is often preferred over dynamic_cast<class_type> in situations where just the class information is needed, because typeid, applied on a type or non de-referenced value is a constant-time procedure, whereas dynamic_cast must traverse the class derivation lattice of its argument at runtime. However, you should never rely on the exact content, like for example returned by std::type_info::name(), as this is implementation specific with respect to the compile.

It is generally only useful to use typeid on the dereference of a pointer or reference (i.e. typeid(*ptr) or typeid(ref)) to an object of polymorphic class type (a class with at least one virtual member function). This is because these are the only expressions that are associated with run-time type information. The type of any other expression is statically known at compile time.

Example
#include <iostream>
#include <typeinfo>  //for 'typeid' to work

class Person {
public:
   // ... Person members ...
   virtual ~Person() {}
};

class Employee : public Person {
   // ... Employee members ...
};

int main () {
   Person person;
   Employee employee;
   Person *ptr = &employee;
   // The string returned by typeid::name is implementation-defined
   std::cout << typeid(person).name() << std::endl;   // Person (statically known at compile-time)
   std::cout << typeid(employee).name() << std::endl; // Employee (statically known at compile-time)
   std::cout << typeid(ptr).name() << std::endl;      // Person * (statically known at compile-time)
   std::cout << typeid(*ptr).name() << std::endl;     // Employee (looked up dynamically at run-time
                                                      //           because it is the dereference of a
                                                      //           pointer to a polymorphic class)
}

Output (exact output varies by system):

 Person
 Employee
 Person*
 Employee


In RTTI it is used in this setup:

const std::type_info& info = typeid(object_expression);

Sometimes we need to know the exact type of an object. The typeid operator returns a reference to a standard class std::type_info that contains information about the type. This class provides some useful members including the == and != operators. The most interesting method is probably:

const char* std::type_info::name() const;

This member function returns a pointer to a C-style string with the name of the object type. For example, using the classes from our earlier example:

const std::type_info &info = typeid(*ptr_interface);
std::cout << info.name() << std::endl;

This program would print something like[1] SpecificClass because that is the dynamic type of the pointer ptr_interface.

typeid is actually an operator rather than a function, as it can also act on types:

const std::type_info& info = typeid(type);

for example (and somewhat circularly)

const std::type_info& info = typeid(std::type_info);

will give a type_info object which describes type_info objects. This latter use is not RTTI, but rather CTTI (compile-time type identification).

Limitations

There are some limitations to RTTI. First, RTTI can only be used with polymorphic types. That means that your classes must have at least one virtual function, either directly or through inheritance. Second, because of the additional information required to store types some compilers require a special switch to enable RTTI.

Note that references to pointers will not work under RTTI:

void example( int*& refptrTest )
{
        std::cout << "What type is *&refptrTest : " << typeid( refptrTest ).name() << std::endl;
}

Will report int*, as typeid() does not support reference types.

Misuses of RTTI

RTTI should only be used sparingly in C++ programs. There are several reasons for this. Most importantly, other language mechanisms such as polymorphism and templates are almost always superior to RTTI. As with everything, there are exceptions, but the usual rule concerning RTTI is more or less the same as with goto statements. Do not use it as a shortcut around proper, more robust design. Only use RTTI if you have a very good reason to do so and only use it if you know what you are doing.


  1. (The exact string returned by std::type_info::name() is compiler-dependent).

Chapter Summary

  1. Templates Development stage: 80%
    1. Template Meta-Programming (TMP) Development stage: 60%
  2. Standard Template Library (STL) Development stage: 60%
  3. Smart Pointers Development stage: 50%
  4. Exception Handling Development stage: 60%
  5. Run-Time Type Information (RTTI) Development stage: 60%