C++ Programming/Chapter Fundamentals Print version
From Wikibooks, the open-content textbooks collection
Authors
- The following people are authors to this book
- Panic
There are many other contributors/editors to the book; a verifiable list of all contributions exist as History Logs at Wikibooks (http://en.wikibooks.org/).
- Acknowledgment is given for using some contents from other works like Programming C-/- -/-, Wikipedia, the Wikibooks Java Programming and C Programming, C++ Exercises for beginners, C/C++ Reference Web Site, and from Wikisource as from authors such as Scott Wheeler, Stephen Ferg and Ivor Horton.
| Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License." |
Fundamentals
The Compiler
A compiler is a program that translates a computer program written in one computer language (the source code) into an equivalent program written in the computer's native machine language. This process of translation is called compilation.
Where to get a compiler
When you select your compiler you must take in consideration your system OS, your personal preferences and the documentation that you can get on using it.
One of most actualized and compatible compilers is GCC. The next section will show how to get a copy and install it on Windows. You can easily find information on the GCC website on how to do it under another OS. GCC is a decent choice, and can be obtained for free. Many Open Source platforms include a recent GCC version. Version 4.0 or later gives fairly good conformance to the C++ standard. Various IDEs are available to support GCC. For Windows, Microsoft Visual Studio Express is currently available free of charge (but not free as in non-propriety) with a C++ compiler that can be used from the command line or from the supplied IDE. An IDE, or Integrated Development Environment, is generally a graphical environment which integrates functionality like editing, compiling, linking, and usually a help system etc.).
|
NOTE: |
GCC
The GCC is a free set of compilers developed by the Free Software Foundation, with Richard Stallman as one of the main architects.
There are many different pre-compiled GCC compiler on the Internet, below shows you some popular choices with detailed steps for installation.
On Windows
Cygwin:
- Go to http://www.cygwin.com and click on the "Install Cygwin Now" button in the upper right corner of the page.
- Click "run" in the window that pops up, and click "next" several times, accepting all the default settings.
- Choose any of the Download sites ("ftp.easynet.be", etc.) when that window comes up; press "next" and the Cygwin installer should start downloading.
- When the "Select Packages" window appears, scroll down to the heading "Devel" and click on the "+" by it. In the list of packages that now displays, scroll down and find the "gcc-core" package; this is the compiler. Click once on the word "Skip", and it should change to some number like "3.4" etc. (the version number), and an "X" will appear next to "gcc-core" and several other related packages that will now be downloaded.
- Click "next" and the compiler as well as the Cygwin tools should start downloading; this could take a while. While you're waiting, go to http://www.crimsoneditor.com and download that free programmer's editor; it's powerful yet easy to use for beginners.
- Once the Cygwin downloads are finished and you have clicked "next", etc. to finish the installation, double-click the Cygwin icon on your desktop to begin the Cygwin "command prompt". Your home directory will automatically be set up in the Cygwin folder, which now should be at "C:\cygwin" (the Cygwin folder is in some ways like a small Unix/Linux computer on your Windows machine -- not technically of course, but it may be helpful to think of it that way).
- Type "gcc" at the Cygwin prompt and press "enter"; if "gcc: no input files" or something like it appears you have succeeded and now have the gcc compiler on your computer (and congratulations -- you have also just received your first error message!).
MinGW + DevCpp-IDE
- Go to http://www.bloodshed.net/devcpp.html, choose the version you want (eventually scrolling down), click on the appropriate download link! For the most current version, you will be redirected to http://www.bloodshed.net/dev/devcpp.html
- Scroll down to read the license and then to the download links. Download a version with Mingw/GCC. It's much easier than to do this assembling yourself. With a very short delay (only some days) you will always get the most current version of mingw packaged with the devcpp IDE. It's absolutely the same as with manual download of the required modules.
- You get an executable that can be executed at user level under any WinNT version. If you want it to be setup for all users, however, you need admin rights. It will install devcpp and mingw in folders of your wish.
- Start the IDE and experience your first project!
You will find something mostly similar to MSVC, including menu and button placement. Of course, many things are somewhat different if you were familiar with the former, but it's as simple as a handfull of clicks to let your first program run.
For DOS
DJGPP:
- Go to Delorie Software and click on Zip Pickerand select the packages you need.
- Use unzip32 to inflate the files into the directory of your choice (ie. C:\DJGPP).
For Linux
- For Redhat, get a gcc-c++ RPM, e.g. using Rpmfind and then install (as root) using rpm -ivh gcc-c++-version-release.arch.rpm
- For Fedora Core, install the GCC C++ compiler (as root) by using yum install gcc-c++
- For Mandrake, install the GCC C++ compiler (as root) by using urpmi gcc-c++
- For Debian, install the GCC C++ compiler (as root) by using apt-get install g++
- For Ubuntu, install the GCC C++ compiler by using sudo apt-get install g++
- If you cannot become root, get the tarball from ftp://ftp.gnu.org/ and follow the instructions in it to compile and install in your home directory.
Compilation
The compilation output of a compiler from translating or compiling a program is saved to a file called an object file. As we have seen before in the The Code Section of the book, it consists of the transformation of source files into object files.
|
NOTE: |
The instructions of this compiled program can then be run (executed) by the computer if the object file is in an executable format. Often, however, there are additional steps that may be required to create an executable program: preprocessing and linking.
Compile Time
Defines the time and operations performed by a compiler (ie, compile-time operations) during a build (creation) of a program (executable or not).
The operations performed at compile time usually include lexical analysis, syntax analysis, various kinds of semantic analysis (eg, type checks, and instantiation of template) and code generation.
The definition of a programming language will specify compile time requirements that source code must meet to be successfully compiled.
Compile time occurs before link time (when the output of one or more compiled files are joined together) and runtime (when a program is executed). In some programming languages it may be necessary for some compilation and linking to occur at runtime. The concept of runtime will be introduced later.
Lexical Analysis
This happens before syntax analysis and converts the code into tokens, which are the parts of the code that the program will actually use, with special tokens for each reserved keyword, and tokens for data types and identifiers and values. The lexical analyzer is the part of the compiler which removes whitespace. It uses whitespace to separate different tokens, and ignores the whitespace. To give an example
int main() { std::cout << "hello world" << std::endl; return 0; }
might be tokenized as
1 = string "int" 2 = string "main" 3 = opening parenthesis 4 = closing parenthesis 5 = opening brace 6 = string "std" 7 = namespace operator 8 = string "cout" 9 = << operator 10 = string ""hello world"" 11 = string "endl" 12 = semicolon 13 = string "return" 14 = number 0 15 = closing brace
and so for this program the lexical analyzer might send something like
1 2 3 4 5 6 7 8 9 10 9 6 11 12 13 14 12 15
to the syntactical analyzer, which is talked about next, to be parsed. It is easier for the syntactical analyzer to apply the rules of the language when it can work with numerical values and can distinguish between language syntax (such as the semicolon) and everything else, and knows what data type each thing has.
Syntax Analysis
This step (also called sometimes syntax checking) ensures that the code is valid and will sequence into an executable program. The syntactical analyzer applies rules to the code, checking to make sure that each opening brace has a corresponding closing brace, and that each declaration has a type, and that the type exists, and that.... syntax analysis is more complicated that lexical analysis =). As an example
int main() { std::cout << "hello world" << std::endl; return 0; }
The syntax analyzer would first look at the string "int", check it against defined keywords, and find that it is a type for integers. The analyzer would then look at the next token as an identifier, and check to make sure that it has used a valid identifier name. It would then look at the next token. Because it is an opening parenthesis it will treat "main" as a function, instead of a declaration of a variable if it found a semicolon or the initialization of an integer variable if it found an equals sign. After the opening parenthesis it would find a closing parenthesis, meaning that the function has 0 parameters. Then it would look at the next token and see it was an opening brace, so it would think that this was the implementation of the function main, instead of a declaration of main if the next token had been a semicolon, even though you can't declare main in c++. It would probably create a counter also to keep track of the level of the statement blocks to make sure the braces were in pairs. After that it would look at the next token, and probably not do anything with it, but then it would see the :: operator, and check that "std" was a valid namespace. Then it would see the next token "cout" as the name of an identifier in the namespace "std", and see that it was a template. The analyzer would see the << operator next, and so would check that the << operator could be used with cout, and also that the next token could be used with the << operator. The same thing would happen with the next token after the ""hello world"" token. Then it would get to the "std" token again, look past it to see the :: operator token and check that the namespace existed again, then check to see if "endl" was in the namespace. Then it would see the semicolon and so it would see that as the end of the statement. Next it would see the keyword "return", and then expect an integer value as the next token because main returns an inter, and it would find 0, which is an integer. Then the next symbol is a semicolon so that is the end of the statement. The next token is a closing brace so that is the end of the function. And there are no more tokens, so if the syntax analyzer didn't find any errors with the code, it would send the tokens to the compiler so that the program could be converted to machine language. This is a simple view of syntax analysis, and real syntax analyzers don't really work this way, but the idea is the same.
Here are some keywords which the syntax analyzer will look for to make sure you aren't using any of these as identifier names, or to know what type you are defining your variables as or what function you are using which is included in the c++ language.
ISO C++ (C++98) Keywords
|
|
|
|
Specific compilers may (in a non-standard compliant mode) also treat some other words as keywords, including cdecl, far, fortran, huge, interrupt, near, pascal, typeof. Old compilers may recognize the overload keyword, an anachronism that has been removed from the language.
The next revision of C++, informally known as C++0x for now, is likely to add some keywords, probably including at least:
- static_assert
- decltype
- nullptr
(These are being considered carefully to minimize breakage to existing code; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2105.html for some details.)
Old compilers may not recognize some or all of the following keywords:
|
|
|
|
C++ Reserved Identifiers
Some "nonstandard" identifiers are reserved for distinct uses, to avoid conflicts on the naming of identifiers by vendors, library creators and users in general.
Reserved identifiers include keywords with two consecutive underscores (__), all that start with an underscore followed by an uppercase letter and some other categories of reserved identifiers carried over from the C library specification.
A list of C reserved identifiers can be found at the Internet Wayback Machine archived page: http://web.archive.org/web/20040209031039/http://oakroadsystems.com/tech/c-predef.htm#ReservedIdentifiers
Compiler Keywords
A limited set of keywords exists to directly control the compiler's behavior, these keywords are very powerful and must be used with care, they may make a huge difference on the program's compile time and running speed.
In C++ Standard, these keywords are called Specifiers.
auto
The auto keyword used to have a different behavior, but in C++0x it will allow one to omit the type of a variable and let the compiler decide. This is particularly useful for generic programming in which the return type of a function may depend on the type of its arguments. Thus, rather than this:
int x = 42; std::vector<double> numbers; numbers.push_back(1.0); numbers.push_back(2.0); for(std::vector<double>::iterator i = numbers.begin(); i != numbers.end(); ++i) { cout << *i << " "; }
we could write this:
auto x = 42; // We can use auto on base types... std::vector<double> numbers; numbers.push_back(1.0); numbers.push_back(2.0); // But auto is most useful for complicated types. for(auto i = numbers.begin(); i != numbers.end(); ++i) { cout << *i << " "; }
Note: This functionality is not yet available.
inline
A function declaration with an inline keyword declares an inline function. The inline keyword is used to suggest to the compiler that a particular function be subjected to in-line expansion; that is, it suggests that the compiler insert the complete body of the function in every context where that function is used and so it is used to avoid the overhead implied by making a CPU jump from one place in code to another and back again to execute a subroutine, as is done in naive implementations of subroutines.
Example:
inline swap( int& a, int& b) { int const tmp(b); b=a; a=tmp; }
Marking a function as inline (possibly implicitly, by defining a member function inside a class/struct definition) is a (non-binding) request to the compiler to consider inlining the function, i.e., expanding its code at the call site; it is legal, but redundant, to add the inline keyword in that context, and good style is to omit it.
Example:
struct length { explicit length(int metres) : m_metres(metres) {} operator int&() { return m_metres; } private: int m_metres; };
Inlining can be an optimization, or a pessimization. It can increase code size (by duplicating the code for a function at multiple call sites) or can decrease it (if the code for the function, after optimization, is less than the size of the code needed to call a non-inline function). It can increase speed (by allowing for more optimization and by avoiding jumps) or can decrease speed (by increasing code size and hence cache misses).
One important side-effect of inlining is that more code is then accessible to the optimizer.
Marking a function as inline also has an effect on linking: multiple definitions of an inline function are permitted (so long as each is in a different translation unit) so long as they are identical. This allows inline function definitions to appear in header files; defining non-inline functions in header files is almost always an error (though function templates can also be defined in header files, and often are).
Mainstream C++ compilers like Microsoft Visual C++ and GCC support an option that lets the compilers automatically inline any suitable function, even those that are not marked as inline functions. A compiler is often in a better position than a human to decide whether a particular function should be inlined; in particular, the compiler may not be willing or able to inline many functions that the human asks it to.
Excessive use of inline functions can greatly increase coupling/dependencies and compilation time, as well as making header files less useful as documentation of interfaces.
extern
The extern keyword tells the compiler that a variable is declared in another source module. The linker then finds this actual declaration and sets up the extern variable to point to the correct location. If a variable is declared extern, and the linker finds no actual declaration of it, it will throw an "Unresolved external symbol" error.
Examples:
-
extern int i;
- declares that there is a variable named i of type int, defined somewhere in the program.
-
extern int j = 0;
- defines a variable j with external linkage; the extern keyword is redundant here.
-
extern void f();
- declares that there is a function f taking no arguments and with no return value defined somewhere in the program; extern is redundant, but sometimes considered good style.
-
extern void f() {;}
- defines the function f() declared above; again, the extern keyword is technically redundant here as external linkage is default.
-
extern const int k = 1;
- defines a constant int k with value 1 and external linkage; extern is required because const variables have internal linkage by default.
Storage Class Specifiers
- register - A hint to the compiler that the specified variable will be heavily used; therefore the compiler should consider allocating a CPU register to the variable. The compiler may ignore this hint.
- static - Retains a memory location for all instances of the program or class.
Compile Speed
Most problems one has with a slow compilation are due to:
- Hardware
- Resources (Slow CPU, low memory and even a slow HD can have an influence)
- Software
- The compiler itself (new is probably better), the design used on the program (structure of object dependencies, includes)
Experience tells that most likely if you are suffering from slow compile times, the program you are trying to compile is poorly designed, take the time to structure your own code to minimize re-compilation after changes.
Use pre-compiled headers and external header guards.
The Preprocessor
The preprocessor is either a separate program invoked by the compiler or part of the compiler itself, which performs intermediate operations that modifies the original source code and internal compiler options before the compiler tries to compile the resulting source code.
The instructions that the preprocessor parses are called directives and come in two forms, preprocessor and compiler directives. Preprocessor directives direct the preprocessor on how it should process the source code and compiler directives direct the compiler on how it should modify internal compiler options. Directives are used to make writing source code easier (more portable for instance) and to make the source code more understandable. They are also the only valid way to make use of facilities (classes, functions, templates, etc.) provided by the C++ Standard Library.
|
NOTE: |
All directives start with '#' at the beginning of a line. The standard directives are:
|
|
|
Inclusion of Header Files (#include)
The #include directive allows a programmer to include contents of one file inside another one. This is commonly used to separate information needed by more than one part of a program into its own file so that it can be included again and again without having to repeatedly type out all the information.
C++ generally requires you to declare what will be used before using it. So, files called headers usually include declarations of what will be used in order for the compiler to successfully compile source code. The standard library (a repository of code that is available alongside every standard-compliant C++ compiler) and 3rd party libraries make use of headers in order to allow the inclusion of the needed declarations in your source code to make use of features/resources that are not part of the language itself.
The first lines in any source file should usually look something like this:
#include <iostream> #include "other.h"
The above lines causes the inclusion of the contents of iostream and other.h to be included for use in your program. Usually this is implemented by just inserting into your program the contents of iostream and other.h. When using angle brackets (<>), the preprocessor is instructed to search for the file to include in a compiler-dependent location. When you use quotation marks (" "), the preprocessor is expected to search in some additional, usually user-defined, locations for the header file, and to fall back to the standard include paths only if it is not found in those additional locations. It is common for this form to include searching in the same directory as the file containing the #include directive.
The iostream header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example there is an output stream object called std::cout (where "cout" is short for "console output") which is used to output text to the standard output, which usually displays the text on the computer screen.
|
NOTE: |
A list of standard C++ header files is listed below:
| Standard Template Library | ||||||||
|---|---|---|---|---|---|---|---|---|
and the
| Standard C Library | |||||
|---|---|---|---|---|---|
Everything inside C++'s standard library is kept in the std:: namespace. Old compilers may include headers before C++ was standardized, named <X.h> and <cX.h>, in addition to or instead of the standard headers. Often these headers have non-templatized classes and pollute the global namespace. Some have the SGI STL on which much of the standard template library is based.
| Non-standard but somewhat common C++ libraries | ||
|---|---|---|
- ↑ Streams based on FILE* from stdio.h.
- ↑ Precursor to iostream. Old stream library mostly included for backwards compatibility even with old compilers.
- ↑ Uses char* whereas sstream uses string. Prefer the standard library sstream.
#pragma
The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma depends on the software implementation of the standard that is used.
Pragmas are used within the source program.
#pragma token(s)
You should check the software implementation of the C++ standard you intend on using for a list of the supported tokens.
For instance one of the most implemented preprocessor directives, #pragma once, when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.
|
NOTE: |
|
NOTE: |
Macros
The C++ preprocessor includes facilities for defining "macros", which roughly means the ability to replace a use of a named macro with one or more tokens. This has various uses from defining simple constants (though const is more often used for this in C++), conditional compilation, code generation and more -- macros are a powerful facility, but if used carelessly can also lead to code that is hard to read and harder to debug!
|
NOTE: Macros don't depend only on the C++ Standard or your actions. They may exist due to the use of external frameworks, libraries or even due the compiler you are using and the specific OS. We will not cover that information on this book but you may find more information in the Pre-defined C/C++ Compiler Macros page at ( http://predef.sourceforge.net/ ) the project maintains a complete list of macros that are compiler and OS agnostic. |
#define and #undef
The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled:
#define USER_MAX (1000)
The #undef directive deletes a current macro definition:
#undef USER_MAX
It is an error to use #define to change the definition of a macro, but it is not an error to use #undef to try to undefine a macro name that is not currently defined. Therefore, if you need to override a previous macro definition, first #undef it, and then use #define to set the new definition.
|
NOTE: Today, for this reason, #define is primarily used to handle compiler and platform differences. E.g, a define might hold a constant which is the appropriate error code for a system call. The use of #define should thus be limited unless absolutely necessary; typedef statements, constant variables, enums, templates and inline functions can often accomplish the same goal more efficiently and safely. By convention, values defined using #define are named in uppercase with "_" separators, this makes it clear to readers that the values is not alterable and in the case of macros, that the construct requires care. Although doing so is not a requirement, it is considered very bad practice to do otherwise. This allows the values to be easily identified when reading the source code. Try to use |
\ (line continuation)
If for some reason it is needed to break a given statement into more than one line, use the \ (backslash) symbol to "escape" the line ends. For example,
#define MULTIPLELINEMACRO \ will use what you write here \ and here etc...
is equivalent to
#define MULTIPLELINEMACRO will use what you write here and here etc...
because the preprocess joins lines ending in a backslash ("\") to the line after them. That happens even before directives (such as #define) are processed, so it works for just about all purposes, not just for macro definitions. The backslash is sometimes said to act as an "escape" character for the newline, changing its interpretation.
In some (fairly rare) cases macros can be more readable when split across multiple lines. Good modern C++ code will use macros only sparingly, so the need for multi-line macro definitions won't arise often.
It's certainly possible to overuse this feature. It's quite legal but entirely indefensible, for example, to write
int ma\
in//ma/
()/*ma/
in/*/{}
That's an abuse of the feature though: while an escaped newline can appear in the middle of a token, there should never be any reason to use it there. Don't try to write code that looks like it belongs in the International Obfuscated C Code Competition.
Warning: there is one occasional "gotcha" with using escaped newlines: if there are any invisible characters after the backslash, the lines will not be joined, and there will almost certainly be an error message produced later on, though it might not be at all obvious what caused it.
Function-like Macros
Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:
#define ABSOLUTE_VALUE( x ) ( ((x) < 0) ? -(x) : (x) )
...
int x = -1;
while( ABSOLUTE_VALUE( x ) ) {
...
}
It's generally a good idea to use extra parentheses when using complex macros. Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.
Macros replace each occurrence of the macro parameter used in the text with the literal contents of the macro parameter without any validation checking. Badly written macros can result in code which won't compile or create hard to discover bugs. Because of side-effects it is considered a very bad idea to use macro functions as described above. However as with any rule, there may be cases where macros are the most efficient means to accomplish a particular goal.
int z = -10; int y = ABSOLUTE_VALUE( z++ );
If ABSOLUTE_VALUE() was a real function 'z' would now have the value of '-9', but because it was an argument in a macro z++ was expanded 3 times (in this case) and thus (in this situation) executed twice, setting z to -8, and y to 9. In similar cases it is very easy to write code which has "undefined behavior", meaning that what it does is completely unpredictable in the eyes of the C++ Standard.
- ABSOLUTE_VALUE( z++ ); expanded:
( ((z++) < 0 ) ? -(z++) : (z++) );
- An example on how to use a macro correctly:
#include <iostream>
#define SLICES 8
#define PART(x) ( (x) / SLICES ) // Note the extra parentheses around x
int main() {
int b = 10, c = 6;
int a = PART(b + c);
std::cout << a;
return 0;
}
-- the result of "a" should be "2" (b + c passed to PART -> ((b + c) / SLICES) -> result is "2")
|
Example: To illustrate the dangers of macros, consider this naive macro #define MAX(a,b) a>b?a:b and the code i = MAX(2,3)+5; j = MAX(3,2)+5; Take a look at this and consider what the the value after execution might be. The statements are turned into int i = 2>3?2:3+5; int j = 3>2?3:2+5; Thus, after execution i=8 and j=3 instead of the expected result of i=j=8! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if a,b contains expressions, the definition must parenthesize every use of a,b in the macro definition, like this: #define MAX(a,b) ((a)>(b)?(a):(b)) This works, provided a,b have no side effects. Indeed, i = 2; j = 3; k = MAX(i++, j++); would result in k=4, i=3 and j=5. This would be highly surprising to anyone expecting MAX() to behave like a function. So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this
inline max(int a, int b) { return a>b?a:b }
has none of the pitfalls above, but will not work with all types. A template (see below) takes care of this
template<typename T> inline max(const T& a, const T& b) { return a>b?a:b }
Indeed, this is (a variation of) the definition used in STL library for std::max(). This library is included with all conforming C++ compilers, so the ideal solution would be to use this. std::max(3,4); |
# and ##
The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example
#define as_string( s ) # s
will make the compiler turn
std::cout << as_string( Hello World! ) << std::endl;
into
std::cout << "Hello World!" << std::endl;
|
NOTE: |
Using ## concatenates what's before the ## with what's after it; the result must be a well-formed preprocessing token. For example
#define concatenate( x, y ) x ## y ... int xy = 10; ...
will make the compiler turn
std::cout << concatenate( x, y ) << std::endl;
into
std::cout << xy << std::endl;
which will, of course, display 10 to standard output.
String literals cannot be concatenated using ##, but the good news is that this isn't a problem: just writing two adjacent string literals is enough to make the preprocessor concatenate them.
String Literal Concatenation
One minor function of the preprocessor is in joining strings together, "string literal concatenation" -- turning code like
std::cout << "Hello " "World!\n";
into
std::cout << "Hello World!\n";
Apart from obscure uses, this is most often useful when writing long messages, as it's not legal in C++ (at this time) to have a string literal which spans multiple lines in your source code (i.e., one which has a newline character inside it). It also helps to keep program lines down to a reasonable length; we can write
function_name("This is a very long string literal, which would not fit "
"onto a single line very nicely -- but with string literal "
"concatenation, we can split it across multiple lines and "
"the preprocessor will glue the pieces together");
Note that this joining happens before compilation; the compiler sees only one string literal here, and there's no work done at runtime, i.e., your program won't run any slower at all because of this joining together of strings.
Concatenation also applies to wide string literals (which are prefixed by an L):
L"this " L"and " L"that"
is converted by the preprocessor into
L"this and that".
|
NOTE: |
Conditional compilation
Conditional compilation is useful for two main purposes:
- To allow certain functionality to be enabled/disabled when compiling a program
- To allow functionality to be implemented in different ways, such as when compiling on different platforms
It is also used sometimes to temporarily "comment-out" code, though using a version control system is often a more effective way to do so.
- Syntax:
#if condition statement(s) #elif condition2 statement(s) ... #elif conditionN statement(s) #else statement(s) #endif #ifdef defined-value statement(s) #else statement(s) #endif #ifndef defined-value statement(s) #else statement(s) #endif
#if
The #if directive allows compile-time conditional checking of preprocessor values such as created with #define. If condition is non-zero the preprocessor will include all statement(s) up to the #else, #elif or #endif directive in the output for processing. Otherwise if the #if condition was false, any #elif directives will be checked in order and the first condition which is true will have its statement(s) included in the output. Finally if the condition of the #if directive and any present #elif directives are all false the statement(s) of the #else directive will be included in the output if present; otherwise, nothing gets included.
The expression used after #if can include boolean and integral constants and arithmetic operations as well as macro names. The allowable expressions are a subset of the full range of C++ expressions (with one exception), but are sufficient for many purposes. The one extra operator available to #if is the defined operator, which can be used to test whether a macro of a given name is currently defined.
#ifdef and #ifndef
The #ifdef and #ifndef directives are short forms of '#if defined(defined-value)' and '#if !defined(defined-value)' respectively. defined(identifier) is valid in any expression evaluated by the preprocessor, and returns true (in this context, equivalent to 1) if a preprocessor variable by the name identifier was defined with #define and false (in this context, equivalent to 0) otherwise. In fact, the parentheses are optional, and it is also valid to write defined identifier without them.
(Possibly the most common use of #ifndef is in creating "include guards" for header files, to ensure that the header files can safely be included multiple times. This is explained in the section on header files.)
#endif
The #endif directive ends #if, #ifdef, #ifndef, #elif and else directives.
- Example:
#if defined(__BSD__) || defined(__LINUX__) #include <unistd.h> #endif
This can be used for example to provide multiple platform support or to have one common source file set for different program versions. Another example of use is using this instead of the (non-standard) #pragma once.
- Example:
foo.hpp:
#ifndef FOO_HPP # define FOO_HPP // code here... #endif // FOO_HPP
bar.hpp:
#include "foo.h" // code here...
foo.cpp:
#include "foo.hpp" #include "bar.hpp" // code here
When we compile foo.cpp, only one copy of foo.hpp will be included due to the use of include guard. When the preprocessor reads the line #include "foo.hpp", the content of foo.hpp will be expanded. Since this is the first time which foo.hpp is read (and assuming that there is no existing declaration of macro FOO_HPP) FOO_HPP will not yet be declared, and so the code will be included normally. When the preprocessor read the line #include "bar.hpp" in foo.cpp, the content of bar.hpp will be expanded as usual, and the file foo.h will be expanded again. Owing to the previous declaration of FOO_HPP, no code in foo.hpp will be inserted. Therefore, this can achieve our goal - avoiding the content of the file being included more than one time.
Compile-time warnings and errors
- Syntax:
#warning message #error message
#error and #warning
The #error directive causes the compiler to stop and spit out the line number and a message given when it is encountered. The #warning directive causes the compiler to spit out a warning with the line number and a message given when it is encountered. These directives are mostly used for debugging.
|
NOTE: |
- Example:
#if defined(__BSD___) #warning Support for BSD is new and may not be stable yet #endif #if defined(__WIN95__) #error Windows 95 is not supported #endif
Source File Names and Line Numbering
The current filename and line number where the preprocessing is being performed can be retrieved using the predefined macros __FILE__ and __LINE__. Line numbers are measured before any escaped newlines are removed. The current values of __FILE__ and __LINE__ can be overridden using the #line directive; it is very rarely appropriate to do this in hand-written code, but can be useful for code generators which create C++ code base on other input files, so that (for example) error messages will refer back to the original input files rather than to the generated C++ code.
Linker
The linker is a program that is responsible for linking and resolving linkage issues, such as the use of symbols or identifiers which are defined in one translation unit and are needed from other translation units, this information is created by the compiler. Symbols or identifiers which are needed outside a single translation unit must have external linkage, in short, the linker's job is to resolve references to undefined symbols by finding out which other object defines a symbol in question, and replacing placeholders with the symbol's address. Of course, the process is more complicated than this; but the basic ideas apply.
Linkers can take objects from a collection called a library. Depending on the library (system or language or external libraries) and options passed, they may only include its symbols that are referenced from other object files or libraries. Libraries for diverse purposes exist, and one or more system libraries are usually linked in by default. We will take a closer look into libraries on the Libraries Section of this book.
Linking
The process of connecting or combining object files produced by a compiler with the libraries necessary to make a working executable program (or a library) is called linking. Linkage refers to the way in which a program is built out of a number of translation units.
C++ programs can be compiled and linked with programs written in other languages, such as C, Fortran, and Pascal. When programs have two or more source programs written in different languages, you should do the following:
- Compile each program module separately with the appropriate compiler.
- Link them together in a separate step.
Internal storage of data types
Bits and Bytes
The byte is the smallest individual piece of data that we can access or modify on a computer. The computer only works on bytes or groups of bytes, never on bits. If you want to modify individual bits, you have to use binary operations on the whole byte that tell the computer how to modify individual bits, but the operation is still done on whole bytes. Before getting too far ahead of ourselves, we'll look at the internal representation of a byte.
Here's a look at a byte as the computer stores it.
There is actually quite a lot of information here. A byte (usually) contains 8 bits. A bit can only have a value of 0 or 1. The bit number is used to label each bit in the byte (so that we can tell which bit we are talking about). You may be wondering why the bits are labeled from 7 to 0 instead of 0 to 7 or even 1 to 8. The reason 0 is used is because computers always start counting at 0. Technically, we COULD start counting at 1, but this would go against the counting nature of the com
