C++ Programming/Chapter Fundamentals Print version

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search



Authors

The following people are authors to this book
Panic

There are many other contributors/editors to the book; a verifiable list of all contributions exist as History Logs at Wikibooks (http://en.wikibooks.org/).

Acknowledgment is given for using some contents from other works like Programming C-/- -/-, Wikipedia, the Wikibooks Java Programming and C Programming, C++ Exercises for beginners, C/C++ Reference Web Site, and from Wikisource as from authors such as Scott Wheeler, Stephen Ferg and Ivor Horton.

Contents

Fundamentals

The Compiler

A compiler is a program that translates a computer program written in one computer language (the source code) into an equivalent program written in the computer's native machine language. This process of translation is called compilation.

Where to get a compiler

When you select your compiler you must take in consideration your system OS, your personal preferences and the documentation that you can get on using it.

One of most actualized and compatible compilers is GCC. The next section will show how to get a copy and install it on Windows. You can easily find information on the GCC website on how to do it under another OS. GCC is a decent choice, and can be obtained for free. Many Open Source platforms include a recent GCC version. Version 4.0 or later gives fairly good conformance to the C++ standard. Various IDEs are available to support GCC. For Windows, Microsoft Visual Studio Express is currently available free of charge (but not free as in non-propriety) with a C++ compiler that can be used from the command line or from the supplied IDE. An IDE, or Integrated Development Environment, is generally a graphical environment which integrates functionality like editing, compiling, linking, and usually a help system etc.).

NOTE:
In Appendix B:External References you will find references to other freely available compilers and even full IDEs you can use.

GCC

The GCC is a free set of compilers developed by the Free Software Foundation, with Richard Stallman as one of the main architects.

There are many different pre-compiled GCC compiler on the Internet, below shows you some popular choices with detailed steps for installation.

On Windows

Cygwin:

  1. Go to http://www.cygwin.com and click on the "Install Cygwin Now" button in the upper right corner of the page.
  2. Click "run" in the window that pops up, and click "next" several times, accepting all the default settings.
  3. Choose any of the Download sites ("ftp.easynet.be", etc.) when that window comes up; press "next" and the Cygwin installer should start downloading.
  4. When the "Select Packages" window appears, scroll down to the heading "Devel" and click on the "+" by it. In the list of packages that now displays, scroll down and find the "gcc-core" package; this is the compiler. Click once on the word "Skip", and it should change to some number like "3.4" etc. (the version number), and an "X" will appear next to "gcc-core" and several other related packages that will now be downloaded.
  5. Click "next" and the compiler as well as the Cygwin tools should start downloading; this could take a while. While you're waiting, go to http://www.crimsoneditor.com and download that free programmer's editor; it's powerful yet easy to use for beginners.
  6. Once the Cygwin downloads are finished and you have clicked "next", etc. to finish the installation, double-click the Cygwin icon on your desktop to begin the Cygwin "command prompt". Your home directory will automatically be set up in the Cygwin folder, which now should be at "C:\cygwin" (the Cygwin folder is in some ways like a small Unix/Linux computer on your Windows machine -- not technically of course, but it may be helpful to think of it that way).
  7. Type "gcc" at the Cygwin prompt and press "enter"; if "gcc: no input files" or something like it appears you have succeeded and now have the gcc compiler on your computer (and congratulations -- you have also just received your first error message!).

MinGW + DevCpp-IDE

  1. Go to http://www.bloodshed.net/devcpp.html, choose the version you want (eventually scrolling down), click on the appropriate download link! For the most current version, you will be redirected to http://www.bloodshed.net/dev/devcpp.html
  2. Scroll down to read the license and then to the download links. Download a version with Mingw/GCC. It's much easier than to do this assembling yourself. With a very short delay (only some days) you will always get the most current version of mingw packaged with the devcpp IDE. It's absolutely the same as with manual download of the required modules.
  3. You get an executable that can be executed at user level under any WinNT version. If you want it to be setup for all users, however, you need admin rights. It will install devcpp and mingw in folders of your wish.
  4. Start the IDE and experience your first project!
    You will find something mostly similar to MSVC, including menu and button placement. Of course, many things are somewhat different if you were familiar with the former, but it's as simple as a handfull of clicks to let your first program run.

For DOS

DJGPP:

  • Go to Delorie Software and click on Zip Pickerand select the packages you need.
  • Use unzip32 to inflate the files into the directory of your choice (ie. C:\DJGPP).
TODO

TODO

  • Complete setup instructions for DJGPP.
  • Add more examples of compilers with detail installation steps, such as MinGW.

For Linux
  • For Redhat, get a gcc-c++ RPM, e.g. using Rpmfind and then install (as root) using rpm -ivh gcc-c++-version-release.arch.rpm
  • For Fedora Core, install the GCC C++ compiler (as root) by using yum install gcc-c++
  • For Mandrake, install the GCC C++ compiler (as root) by using urpmi gcc-c++
  • For Debian, install the GCC C++ compiler (as root) by using apt-get install g++
  • For Ubuntu, install the GCC C++ compiler by using sudo apt-get install g++
  • If you cannot become root, get the tarball from ftp://ftp.gnu.org/ and follow the instructions in it to compile and install in your home directory.

Compilation

The compilation output of a compiler from translating or compiling a program is saved to a file called an object file. As we have seen before in the The Code Section of the book, it consists of the transformation of source files into object files.

NOTE:
Some files may be created/needed for a successful compilation, that data isn't part of the C++ language or may result from the compilation of external code (an example would be a library), this may depend on the specific compiler you use (MS Visual Studio for example adds several extra files to a project), in that case you should check the documentation or it can part of a specific framework that needs to be accessed. Be aware that some of this constructs may limit the portability of the code.

The instructions of this compiled program can then be run (executed) by the computer if the object file is in an executable format. Often, however, there are additional steps that may be required to create an executable program: preprocessing and linking.

Compile Time

Defines the time and operations performed by a compiler (ie, compile-time operations) during a build (creation) of a program (executable or not).

The operations performed at compile time usually include lexical analysis, syntax analysis, various kinds of semantic analysis (eg, type checks, and instantiation of template) and code generation.

The definition of a programming language will specify compile time requirements that source code must meet to be successfully compiled.

Compile time occurs before link time (when the output of one or more compiled files are joined together) and runtime (when a program is executed). In some programming languages it may be necessary for some compilation and linking to occur at runtime. The concept of runtime will be introduced later.

TODO

TODO
Add run time concept, and mention it here (probably on Debugging)

Lexical Analysis

This happens before syntax analysis and converts the code into tokens, which are the parts of the code that the program will actually use, with special tokens for each reserved keyword, and tokens for data types and identifiers and values. The lexical analyzer is the part of the compiler which removes whitespace. It uses whitespace to separate different tokens, and ignores the whitespace. To give an example

int main()
{
    std::cout << "hello world" << std::endl;
    return 0;
}

might be tokenized as

1 = string "int"
2 = string "main"
3 = opening parenthesis
4 = closing parenthesis
5 = opening brace
6 = string "std"
7 = namespace operator
8 = string "cout"
9 = << operator
10 = string ""hello world""
11 = string "endl"
12 = semicolon
13 = string "return"
14 = number 0
15 = closing brace

and so for this program the lexical analyzer might send something like

1 2 3 4 5 6 7 8 9 10 9 6 11 12 13 14 12 15

to the syntactical analyzer, which is talked about next, to be parsed. It is easier for the syntactical analyzer to apply the rules of the language when it can work with numerical values and can distinguish between language syntax (such as the semicolon) and everything else, and knows what data type each thing has.

TODO

TODO

Make this closer to what actually happens. This is a very simple and probably wrong example.

Syntax Analysis

This step (also called sometimes syntax checking) ensures that the code is valid and will sequence into an executable program. The syntactical analyzer applies rules to the code, checking to make sure that each opening brace has a corresponding closing brace, and that each declaration has a type, and that the type exists, and that.... syntax analysis is more complicated that lexical analysis =). As an example

int main()
{
    std::cout << "hello world" << std::endl;
    return 0;
}

The syntax analyzer would first look at the string "int", check it against defined keywords, and find that it is a type for integers. The analyzer would then look at the next token as an identifier, and check to make sure that it has used a valid identifier name. It would then look at the next token. Because it is an opening parenthesis it will treat "main" as a function, instead of a declaration of a variable if it found a semicolon or the initialization of an integer variable if it found an equals sign. After the opening parenthesis it would find a closing parenthesis, meaning that the function has 0 parameters. Then it would look at the next token and see it was an opening brace, so it would think that this was the implementation of the function main, instead of a declaration of main if the next token had been a semicolon, even though you can't declare main in c++. It would probably create a counter also to keep track of the level of the statement blocks to make sure the braces were in pairs. After that it would look at the next token, and probably not do anything with it, but then it would see the :: operator, and check that "std" was a valid namespace. Then it would see the next token "cout" as the name of an identifier in the namespace "std", and see that it was a template. The analyzer would see the << operator next, and so would check that the << operator could be used with cout, and also that the next token could be used with the << operator. The same thing would happen with the next token after the ""hello world"" token. Then it would get to the "std" token again, look past it to see the :: operator token and check that the namespace existed again, then check to see if "endl" was in the namespace. Then it would see the semicolon and so it would see that as the end of the statement. Next it would see the keyword "return", and then expect an integer value as the next token because main returns an inter, and it would find 0, which is an integer. Then the next symbol is a semicolon so that is the end of the statement. The next token is a closing brace so that is the end of the function. And there are no more tokens, so if the syntax analyzer didn't find any errors with the code, it would send the tokens to the compiler so that the program could be converted to machine language. This is a simple view of syntax analysis, and real syntax analyzers don't really work this way, but the idea is the same.

Here are some keywords which the syntax analyzer will look for to make sure you aren't using any of these as identifier names, or to know what type you are defining your variables as or what function you are using which is included in the c++ language.

ISO C++ (C++98) Keywords

  • and
  • and_eq
  • asm
  • auto
  • bitand
  • bitor
  • bool
  • break
  • case
  • catch
  • char
  • class
  • compl
  • const
  • const_cast
  • continue
  • default
  • delete
  • do
  • double
  • dynamic_cast
  • else
  • enum
  • explicit
  • export
  • extern
  • false
  • float
  • for
  • friend
  • goto
  • if
  • inline
  • int
  • long
  • mutable
  • namespace
  • new
  • not
  • not_eq
  • operator
  • or
  • or_eq
  • private
  • protected
  • public
  • register
  • reinterpret_cast
  • return
  • short
  • signed
  • sizeof
  • static
  • static_cast
  • struct
  • switch
  • template
  • this
  • throw
  • true
  • try
  • typedef
  • typeid
  • typename
  • union
  • unsigned
  • using
  • virtual
  • void
  • volatile
  • wchar_t
  • while
  • xor
  • xor_eq

Specific compilers may (in a non-standard compliant mode) also treat some other words as keywords, including cdecl, far, fortran, huge, interrupt, near, pascal, typeof. Old compilers may recognize the overload keyword, an anachronism that has been removed from the language.

The next revision of C++, informally known as C++0x for now, is likely to add some keywords, probably including at least:

  • static_assert
  • decltype
  • nullptr

(These are being considered carefully to minimize breakage to existing code; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2105.html for some details.)

Old compilers may not recognize some or all of the following keywords:

  • and
  • and_eq
  • bitand
  • bitor
  • bool
  • catch
  • compl
  • const_cast
  • dynamic_cast
  • explicit
  • export
  • false
  • mutable
  • namespace
  • not
  • not_eq
  • or
  • or_eq
  • reinterpret_cast
  • static_cast
  • template
  • throw
  • true
  • try
  • typeid
  • typename
  • using
  • wchar_t
  • xor
  • xor_eq

C++ Reserved Identifiers

Some "nonstandard" identifiers are reserved for distinct uses, to avoid conflicts on the naming of identifiers by vendors, library creators and users in general.

Reserved identifiers include keywords with two consecutive underscores (__), all that start with an underscore followed by an uppercase letter and some other categories of reserved identifiers carried over from the C library specification.

A list of C reserved identifiers can be found at the Internet Wayback Machine archived page: http://web.archive.org/web/20040209031039/http://oakroadsystems.com/tech/c-predef.htm#ReservedIdentifiers

TODO

TODO
It would be nice to list those C reserved identifiers

Compiler Keywords

A limited set of keywords exists to directly control the compiler's behavior, these keywords are very powerful and must be used with care, they may make a huge difference on the program's compile time and running speed.

In C++ Standard, these keywords are called Specifiers.

auto

The auto keyword used to have a different behavior, but in C++0x it will allow one to omit the type of a variable and let the compiler decide. This is particularly useful for generic programming in which the return type of a function may depend on the type of its arguments. Thus, rather than this:

int x = 42;
std::vector<double> numbers;
numbers.push_back(1.0);
numbers.push_back(2.0);
for(std::vector<double>::iterator i = numbers.begin();
    i != numbers.end(); ++i) {
  cout << *i << " ";
}

we could write this:

auto x = 42; // We can use auto on base types...
std::vector<double> numbers;
numbers.push_back(1.0);
numbers.push_back(2.0);
// But auto is most useful for complicated types.
for(auto i = numbers.begin(); i != numbers.end(); ++i) {
  cout << *i << " ";
}

Note: This functionality is not yet available.

inline

A function declaration with an inline keyword declares an inline function. The inline keyword is used to suggest to the compiler that a particular function be subjected to in-line expansion; that is, it suggests that the compiler insert the complete body of the function in every context where that function is used and so it is used to avoid the overhead implied by making a CPU jump from one place in code to another and back again to execute a subroutine, as is done in naive implementations of subroutines.

Example:

inline swap( int& a, int& b) { int const tmp(b); b=a; a=tmp; }

Marking a function as inline (possibly implicitly, by defining a member function inside a class/struct definition) is a (non-binding) request to the compiler to consider inlining the function, i.e., expanding its code at the call site; it is legal, but redundant, to add the inline keyword in that context, and good style is to omit it.

Example:

struct length
{
  explicit length(int metres) : m_metres(metres) {}
  operator int&() { return m_metres; }
  private:
  int m_metres;
};

Inlining can be an optimization, or a pessimization. It can increase code size (by duplicating the code for a function at multiple call sites) or can decrease it (if the code for the function, after optimization, is less than the size of the code needed to call a non-inline function). It can increase speed (by allowing for more optimization and by avoiding jumps) or can decrease speed (by increasing code size and hence cache misses).

One important side-effect of inlining is that more code is then accessible to the optimizer.

Marking a function as inline also has an effect on linking: multiple definitions of an inline function are permitted (so long as each is in a different translation unit) so long as they are identical. This allows inline function definitions to appear in header files; defining non-inline functions in header files is almost always an error (though function templates can also be defined in header files, and often are).

Mainstream C++ compilers like Microsoft Visual C++ and GCC support an option that lets the compilers automatically inline any suitable function, even those that are not marked as inline functions. A compiler is often in a better position than a human to decide whether a particular function should be inlined; in particular, the compiler may not be willing or able to inline many functions that the human asks it to.

Excessive use of inline functions can greatly increase coupling/dependencies and compilation time, as well as making header files less useful as documentation of interfaces.

extern

The extern keyword tells the compiler that a variable is declared in another source module. The linker then finds this actual declaration and sets up the extern variable to point to the correct location. If a variable is declared extern, and the linker finds no actual declaration of it, it will throw an "Unresolved external symbol" error.

Examples:

extern int i;
declares that there is a variable named i of type int, defined somewhere in the program.
extern int j = 0;
defines a variable j with external linkage; the extern keyword is redundant here.
extern void f();
declares that there is a function f taking no arguments and with no return value defined somewhere in the program; extern is redundant, but sometimes considered good style.
extern void f() {;}
defines the function f() declared above; again, the extern keyword is technically redundant here as external linkage is default.
extern const int k = 1;
defines a constant int k with value 1 and external linkage; extern is required because const variables have internal linkage by default.

Storage Class Specifiers

  • register - A hint to the compiler that the specified variable will be heavily used; therefore the compiler should consider allocating a CPU register to the variable. The compiler may ignore this hint.
  • static - Retains a memory location for all instances of the program or class.

Compile Speed

Most problems one has with a slow compilation are due to:

  • Hardware
Resources (Slow CPU, low memory and even a slow HD can have an influence)
  • Software
The compiler itself (new is probably better), the design used on the program (structure of object dependencies, includes)

Experience tells that most likely if you are suffering from slow compile times, the program you are trying to compile is poorly designed, take the time to structure your own code to minimize re-compilation after changes.

Use pre-compiled headers and external header guards.

The Preprocessor

The preprocessor is either a separate program invoked by the compiler or part of the compiler itself, which performs intermediate operations that modifies the original source code and internal compiler options before the compiler tries to compile the resulting source code.

The instructions that the preprocessor parses are called directives and come in two forms, preprocessor and compiler directives. Preprocessor directives direct the preprocessor on how it should process the source code and compiler directives direct the compiler on how it should modify internal compiler options. Directives are used to make writing source code easier (more portable for instance) and to make the source code more understandable. They are also the only valid way to make use of facilities (classes, functions, templates, etc.) provided by the C++ Standard Library.

NOTE:
Check the documentation of your compiler/preprocessor for information on how it implements the preprocessing phase and for any additional features not covered by the standard that may be available. For in depth information on the subject of parsing you can read "Compiler Construction" (http://en.wikibooks.org/wiki/Compiler_Construction)

All directives start with '#' at the beginning of a line. The standard directives are:

  • #define
  • #elif
  • #else
  • #endif
  • #error
  • #if
  • #ifdef
  • #ifndef
  • #include
  • #line
  • #pragma
  • #undef

Inclusion of Header Files (#include)

The #include directive allows a programmer to include contents of one file inside another one. This is commonly used to separate information needed by more than one part of a program into its own file so that it can be included again and again without having to repeatedly type out all the information.

C++ generally requires you to declare what will be used before using it. So, files called headers usually include declarations of what will be used in order for the compiler to successfully compile source code. The standard library (a repository of code that is available alongside every standard-compliant C++ compiler) and 3rd party libraries make use of headers in order to allow the inclusion of the needed declarations in your source code to make use of features/resources that are not part of the language itself.

The first lines in any source file should usually look something like this:

#include <iostream>
#include "other.h"

The above lines causes the inclusion of the contents of iostream and other.h to be included for use in your program. Usually this is implemented by just inserting into your program the contents of iostream and other.h. When using angle brackets (<>), the preprocessor is instructed to search for the file to include in a compiler-dependent location. When you use quotation marks (" "), the preprocessor is expected to search in some additional, usually user-defined, locations for the header file, and to fall back to the standard include paths only if it is not found in those additional locations. It is common for this form to include searching in the same directory as the file containing the #include directive.

The iostream header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example there is an output stream object called std::cout (where "cout" is short for "console output") which is used to output text to the standard output, which usually displays the text on the computer screen.

NOTE:
Compilers are allowed to make an exception in the case of standard library as to whether a header file by a given name actually exists or just has the same effect as if the header file did exist. Check the documentation of your preprocessor/compiler for any vendor specific implementation of the #include directive and for specific search locations of standard and user-defined headers. This can lead to portability problems and confusion.

A list of standard C++ header files is listed below:


Standard Template Library

and the

Standard C Library

Everything inside C++'s standard library is kept in the std:: namespace. Old compilers may include headers before C++ was standardized, named <X.h> and <cX.h>, in addition to or instead of the standard headers. Often these headers have non-templatized classes and pollute the global namespace. Some have the SGI STL on which much of the standard template library is based.

Non-standard but somewhat common C++ libraries
  1. Streams based on FILE* from stdio.h.
  2. Precursor to iostream. Old stream library mostly included for backwards compatibility even with old compilers.
  3. Uses char* whereas sstream uses string. Prefer the standard library sstream.

#pragma

The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma depends on the software implementation of the standard that is used.

Pragmas are used within the source program.

#pragma token(s)

You should check the software implementation of the C++ standard you intend on using for a list of the supported tokens.

For instance one of the most implemented preprocessor directives, #pragma once, when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.

NOTE:
Other methods exist to do this action that is commonly referred to as using include guards.


NOTE:
In gcc documentation, #pragma once has been described as an obsolete preprocessor directive.

Macros

The C++ preprocessor includes facilities for defining "macros", which roughly means the ability to replace a use of a named macro with one or more tokens. This has various uses from defining simple constants (though const is more often used for this in C++), conditional compilation, code generation and more -- macros are a powerful facility, but if used carelessly can also lead to code that is hard to read and harder to debug!

NOTE:

Macros don't depend only on the C++ Standard or your actions. They may exist due to the use of external frameworks, libraries or even due the compiler you are using and the specific OS. We will not cover that information on this book but you may find more information in the Pre-defined C/C++ Compiler Macros page at ( http://predef.sourceforge.net/ ) the project maintains a complete list of macros that are compiler and OS agnostic.

#define and #undef

The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled:

#define USER_MAX (1000)

The #undef directive deletes a current macro definition:

#undef USER_MAX

It is an error to use #define to change the definition of a macro, but it is not an error to use #undef to try to undefine a macro name that is not currently defined. Therefore, if you need to override a previous macro definition, first #undef it, and then use #define to set the new definition.

NOTE:
Because preprocessor definitions are substituted before the compiler acts on the source code, any errors that are introduced by #define are difficult to trace. For example using value or macro names that are the same as some existing identifier can create subtle errors, since the preprocessor will substitute the identifier names in the source code.

Today, for this reason, #define is primarily used to handle compiler and platform differences. E.g, a define might hold a constant which is the appropriate error code for a system call. The use of #define should thus be limited unless absolutely necessary; typedef statements, constant variables, enums, templates and inline functions can often accomplish the same goal more efficiently and safely.

By convention, values defined using #define are named in uppercase with "_" separators, this makes it clear to readers that the values is not alterable and in the case of macros, that the construct requires care. Although doing so is not a requirement, it is considered very bad practice to do otherwise. This allows the values to be easily identified when reading the source code.

Try to use const and inline instead of #define.

\ (line continuation)

If for some reason it is needed to break a given statement into more than one line, use the \ (backslash) symbol to "escape" the line ends. For example,

#define MULTIPLELINEMACRO \
 will use what you write here \
 and here etc...

is equivalent to

#define MULTIPLELINEMACRO will use what you write here and here etc...

because the preprocess joins lines ending in a backslash ("\") to the line after them. That happens even before directives (such as #define) are processed, so it works for just about all purposes, not just for macro definitions. The backslash is sometimes said to act as an "escape" character for the newline, changing its interpretation.

In some (fairly rare) cases macros can be more readable when split across multiple lines. Good modern C++ code will use macros only sparingly, so the need for multi-line macro definitions won't arise often.

It's certainly possible to overuse this feature. It's quite legal but entirely indefensible, for example, to write

 int ma\
 in//ma/
 ()/*ma/
 in/*/{}

That's an abuse of the feature though: while an escaped newline can appear in the middle of a token, there should never be any reason to use it there. Don't try to write code that looks like it belongs in the International Obfuscated C Code Competition.

Warning: there is one occasional "gotcha" with using escaped newlines: if there are any invisible characters after the backslash, the lines will not be joined, and there will almost certainly be an error message produced later on, though it might not be at all obvious what caused it.

Function-like Macros

Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:

#define ABSOLUTE_VALUE( x ) ( ((x) < 0) ? -(x) : (x) )
...
int x = -1;
while( ABSOLUTE_VALUE( x ) ) {
...
}

It's generally a good idea to use extra parentheses when using complex macros. Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.

Macros replace each occurrence of the macro parameter used in the text with the literal contents of the macro parameter without any validation checking. Badly written macros can result in code which won't compile or create hard to discover bugs. Because of side-effects it is considered a very bad idea to use macro functions as described above. However as with any rule, there may be cases where macros are the most efficient means to accomplish a particular goal.

int z = -10;
int y = ABSOLUTE_VALUE( z++ );

If ABSOLUTE_VALUE() was a real function 'z' would now have the value of '-9', but because it was an argument in a macro z++ was expanded 3 times (in this case) and thus (in this situation) executed twice, setting z to -8, and y to 9. In similar cases it is very easy to write code which has "undefined behavior", meaning that what it does is completely unpredictable in the eyes of the C++ Standard.

  • ABSOLUTE_VALUE( z++ ); expanded:
( ((z++) < 0 ) ? -(z++) : (z++) );
  • An example on how to use a macro correctly:
#include <iostream>

#define SLICES 8
#define PART(x) ( (x) / SLICES ) // Note the extra parentheses around x

int main() {
  int b = 10, c = 6;
  
  int a = PART(b + c);
  std::cout << a;
  
  return 0;
}

-- the result of "a" should be "2" (b + c passed to PART -> ((b + c) / SLICES) -> result is "2")

Example:

To illustrate the dangers of macros, consider this naive macro

#define MAX(a,b) a>b?a:b

and the code

i = MAX(2,3)+5;
j = MAX(3,2)+5;

Take a look at this and consider what the the value after execution might be. The statements are turned into

int i = 2>3?2:3+5;
int j = 3>2?3:2+5;

Thus, after execution i=8 and j=3 instead of the expected result of i=j=8! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if a,b contains expressions, the definition must parenthesize every use of a,b in the macro definition, like this:

#define MAX(a,b) ((a)>(b)?(a):(b))

This works, provided a,b have no side effects. Indeed,

i = 2;
j = 3;
k = MAX(i++, j++);

would result in k=4, i=3 and j=5. This would be highly surprising to anyone expecting MAX() to behave like a function.

So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this

inline max(int a, int b) { return a>b?a:b }

has none of the pitfalls above, but will not work with all types. A template (see below) takes care of this

template<typename T> inline max(const T& a, const T& b) { return a>b?a:b }

Indeed, this is (a variation of) the definition used in STL library for std::max(). This library is included with all conforming C++ compilers, so the ideal solution would be to use this.

std::max(3,4);

# and ##

The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example

#define as_string( s ) # s

will make the compiler turn

std::cout << as_string( Hello  World! ) << std::endl;

into

std::cout << "Hello World!" << std::endl;

NOTE:
Observe the leading and trailing whitespace from the argument to # is removed, and consecutive sequences of whitespace between tokens are converted to single spaces.

Using ## concatenates what's before the ## with what's after it; the result must be a well-formed preprocessing token. For example

#define concatenate( x, y ) x ## y
...
int xy = 10;
...

will make the compiler turn

std::cout << concatenate( x, y ) << std::endl;

into

std::cout << xy << std::endl;

which will, of course, display 10 to standard output.

String literals cannot be concatenated using ##, but the good news is that this isn't a problem: just writing two adjacent string literals is enough to make the preprocessor concatenate them.

String Literal Concatenation

One minor function of the preprocessor is in joining strings together, "string literal concatenation" -- turning code like

 std::cout << "Hello " "World!\n";

into

 std::cout << "Hello World!\n";

Apart from obscure uses, this is most often useful when writing long messages, as it's not legal in C++ (at this time) to have a string literal which spans multiple lines in your source code (i.e., one which has a newline character inside it). It also helps to keep program lines down to a reasonable length; we can write

 function_name("This is a very long string literal, which would not fit "
               "onto a single line very nicely -- but with string literal "
               "concatenation, we can split it across multiple lines and "
               "the preprocessor will glue the pieces together");

Note that this joining happens before compilation; the compiler sees only one string literal here, and there's no work done at runtime, i.e., your program won't run any slower at all because of this joining together of strings.

Concatenation also applies to wide string literals (which are prefixed by an L):

 L"this " L"and " L"that"

is converted by the preprocessor into

 L"this and that".

NOTE:
For completeness, note that C99 has different rules for this than C++98, and that C++0x seems almost certain to match C99's more tolerant rules, which allow joining of a narrow string literal to a wide string literal, something which was not valid in C++98.

Conditional compilation

Conditional compilation is useful for two main purposes:

  • To allow certain functionality to be enabled/disabled when compiling a program
  • To allow functionality to be implemented in different ways, such as when compiling on different platforms

It is also used sometimes to temporarily "comment-out" code, though using a version control system is often a more effective way to do so.

  • Syntax:
#if condition
  statement(s)
#elif condition2
  statement(s)
...
#elif conditionN
  statement(s)
#else
  statement(s)
#endif

#ifdef defined-value
  statement(s)
#else
  statement(s)
#endif

#ifndef defined-value
  statement(s)
#else
  statement(s)
#endif

#if

The #if directive allows compile-time conditional checking of preprocessor values such as created with #define. If condition is non-zero the preprocessor will include all statement(s) up to the #else, #elif or #endif directive in the output for processing. Otherwise if the #if condition was false, any #elif directives will be checked in order and the first condition which is true will have its statement(s) included in the output. Finally if the condition of the #if directive and any present #elif directives are all false the statement(s) of the #else directive will be included in the output if present; otherwise, nothing gets included.

The expression used after #if can include boolean and integral constants and arithmetic operations as well as macro names. The allowable expressions are a subset of the full range of C++ expressions (with one exception), but are sufficient for many purposes. The one extra operator available to #if is the defined operator, which can be used to test whether a macro of a given name is currently defined.

#ifdef and #ifndef

The #ifdef and #ifndef directives are short forms of '#if defined(defined-value)' and '#if !defined(defined-value)' respectively. defined(identifier) is valid in any expression evaluated by the preprocessor, and returns true (in this context, equivalent to 1) if a preprocessor variable by the name identifier was defined with #define and false (in this context, equivalent to 0) otherwise. In fact, the parentheses are optional, and it is also valid to write defined identifier without them.

(Possibly the most common use of #ifndef is in creating "include guards" for header files, to ensure that the header files can safely be included multiple times. This is explained in the section on header files.)

#endif

The #endif directive ends #if, #ifdef, #ifndef, #elif and else directives.

  • Example:
#if defined(__BSD__) || defined(__LINUX__)
#include <unistd.h>
#endif

This can be used for example to provide multiple platform support or to have one common source file set for different program versions. Another example of use is using this instead of the (non-standard) #pragma once.

  • Example:

foo.hpp:

#ifndef FOO_HPP
# define FOO_HPP

 // code here...

#endif // FOO_HPP

bar.hpp:

#include "foo.h"

 // code here...

foo.cpp:

#include "foo.hpp"
#include "bar.hpp"

 // code here

When we compile foo.cpp, only one copy of foo.hpp will be included due to the use of include guard. When the preprocessor reads the line #include "foo.hpp", the content of foo.hpp will be expanded. Since this is the first time which foo.hpp is read (and assuming that there is no existing declaration of macro FOO_HPP) FOO_HPP will not yet be declared, and so the code will be included normally. When the preprocessor read the line #include "bar.hpp" in foo.cpp, the content of bar.hpp will be expanded as usual, and the file foo.h will be expanded again. Owing to the previous declaration of FOO_HPP, no code in foo.hpp will be inserted. Therefore, this can achieve our goal - avoiding the content of the file being included more than one time.

Compile-time warnings and errors

  • Syntax:
#warning message
#error message

#error and #warning

The #error directive causes the compiler to stop and spit out the line number and a message given when it is encountered. The #warning directive causes the compiler to spit out a warning with the line number and a message given when it is encountered. These directives are mostly used for debugging.

NOTE:
#error is part of Standard C++, whereas #warning is not (though it is widely supported).

  • Example:
#if defined(__BSD___)
#warning Support for BSD is new and may not be stable yet
#endif

#if defined(__WIN95__)
#error Windows 95 is not supported
#endif

Source File Names and Line Numbering

The current filename and line number where the preprocessing is being performed can be retrieved using the predefined macros __FILE__ and __LINE__. Line numbers are measured before any escaped newlines are removed. The current values of __FILE__ and __LINE__ can be overridden using the #line directive; it is very rarely appropriate to do this in hand-written code, but can be useful for code generators which create C++ code base on other input files, so that (for example) error messages will refer back to the original input files rather than to the generated C++ code.

Linker

The linker is a program that is responsible for linking and resolving linkage issues, such as the use of symbols or identifiers which are defined in one translation unit and are needed from other translation units, this information is created by the compiler. Symbols or identifiers which are needed outside a single translation unit must have external linkage, in short, the linker's job is to resolve references to undefined symbols by finding out which other object defines a symbol in question, and replacing placeholders with the symbol's address. Of course, the process is more complicated than this; but the basic ideas apply.

Linkers can take objects from a collection called a library. Depending on the library (system or language or external libraries) and options passed, they may only include its symbols that are referenced from other object files or libraries. Libraries for diverse purposes exist, and one or more system libraries are usually linked in by default. We will take a closer look into libraries on the Libraries Section of this book.

Linking

The process of connecting or combining object files produced by a compiler with the libraries necessary to make a working executable program (or a library) is called linking. Linkage refers to the way in which a program is built out of a number of translation units.

C++ programs can be compiled and linked with programs written in other languages, such as C, Fortran, and Pascal. When programs have two or more source programs written in different languages, you should do the following:

  • Compile each program module separately with the appropriate compiler.
  • Link them together in a separate step.

Internal storage of data types

Bits and Bytes

The byte is the smallest individual piece of data that we can access or modify on a computer. The computer only works on bytes or groups of bytes, never on bits. If you want to modify individual bits, you have to use binary operations on the whole byte that tell the computer how to modify individual bits, but the operation is still done on whole bytes. Before getting too far ahead of ourselves, we'll look at the internal representation of a byte.

Here's a look at a byte as the computer stores it.

Image:byte45.png

There is actually quite a lot of information here. A byte (usually) contains 8 bits. A bit can only have a value of 0 or 1. The bit number is used to label each bit in the byte (so that we can tell which bit we are talking about). You may be wondering why the bits are labeled from 7 to 0 instead of 0 to 7 or even 1 to 8. The reason 0 is used is because computers always start counting at 0. Technically, we COULD start counting at 1, but this would go against the counting nature of the com