C++ Programming/Print version

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

NOTE:
At present there is an issue on how transclusions are processed from Template limits there are several ways to address this limitation but there seems also to be some bugs pending resolution. As is it is impossible to guarantee that all the book's content is displayed in this page. See if you can work with the by Chapter printing alternative in the meanwhile or post a request for resolution on at the Wikibooks:Reading room/Technical Assistance. (This note will not be included if you still intend to use this page)



About the book

Foreword

This book covers the C++ programming language, its interactions with software design and real life use of the language. Its presented as an introductory to advance course but can also be used as reference book. If you are already familiar with programming in other languages you can skip most of the Getting Started Chapter (it deals with introducing the language and the first steps needed to get you started). You should not skip the Programming Paradigms introduction, since C++ does have some particulars on that topic that should be useful even if you already know an Object Oriented Programming language. The Language Comparisons Section, providing comparisons for some language(s) you may already know, is important for veterans. However if this is your first contact with programming then continue on reading, and take in consideration that the Programming Paradigms section can be hard to digest if you lack some experience, don't despair, the relevant points will be extended when other concepts are introduced, that section is provided to give you a mental framework to help you not only to understand C++, but to let you easily adapt to (and from) other languages that share those concepts.

Guide to Readers

This is a wikibook (en.wikibooks.org), as such you should learn a bit about what it is and how it does its magic.

The book is organized into different parts, but as this is a work that is always evolving, things may be missing or just not where they should be, you are free to become a writer and contribute to fix things up...

Reader Comments

If you have comments about the technical accuracy, content, or organization of this document, please tell us (e.g. by using the "discussion" pages or by email). Be sure to include the section or the part title of the document with your comments and the date of your copy of the book. If you are really convinced of your point, information or correction then become a writer (at Wikibooks) and do it, it can always be rolled back if someone disagrees.

Guide to Writers

Authors/Contributors should register if intending to make non-anonymous contributions to the book (this will give more value and relevance to your opinions and views on the evolution of the work and enable others to talk to you) and try to follow the structure. If you have major ideas or big changes use the discussion area; as a rule just go with the flow.

Conventions 
A set of conventions have been adopted on the creation of this book, please read about them before you contribute any content on the book's talk page.

Authors

The following people are authors to this book
Panic
The above authors release their work under the following license; this page shall be included in any copy of the C++ Programming book.
Any source code included if not bearing a different statement shall be considered under the public domain.
Images used have their own copyright status, specified in their respective repositories (en.wikibooks.org or at commons.wikimedia.org).
Acknowledgment is given for using some contents from other works like Wikipedia, the Wikibooks Java Programming, C Programming and C++ Exercises for beginners, the C/C++ Reference Web Site, and from Wikisource, as from the authors Scott Wheeler, Stephen Ferg and Ivor Horton.

There are many other contributors/editors to the book; a verifiable list of all contributions exist as History Logs at Wikibooks (http://en.wikibooks.org/).

Getting Started

Introducing C++

C++ (pronounced "see plus plus") is a general-purpose, object-oriented, statically typed, free-form, multi-paradigm programming language supporting procedural programming, data abstraction, and generic programming. During the 1990s, C++ became one of the most popular computer programming languages.

History

Bjarne Stroustrup a Computer Scientist, from Bell Labs was the designer and original implementer of C++ (originally named "C with Classes") during the 1980s as an enhancement to the C programming language. Enhancements started with the addition of classes, followed by, among many features, virtual functions, operator overloading, multiple inheritance, templates, and exception handling, these and other features are covered in detail along this book.

The C++ programming language is a standard recognized by the ANSI (The American National Standards Institute), BSI (The British Standards Institute), DIN (The German national standards organization), several other national standards bodies, and was ratified in 1998 by the ISO (The International Standards Organization) as ISO/IEC 14882:1998, consists of two parts: the Core Language and the Standard Library; the latter includes the Standard Template Library and the Standard C Library (ANSI C 89).

Features introduced in C++ include declarations as statements, function-like casts, new/delete, bool, reference types, const, inline functions, default arguments, function overloading, namespaces, classes (including all class-related features such as inheritance, member functions, virtual functions, abstract classes, and constructors), operator overloading, templates, the :: operator, exception handling, run-time type identification, and more type checking in several cases. Comments starting with two slashes ("//") were originally part of BCPL, and was reintroduced in C++. Several features of C++ were later adopted by C, including const, inline, declarations in for loops, and C++-style comments (using the // symbol).

The current version, which is the 2003 version, ISO/IEC 14882:2003 redefines the standard language as a single item. The STL that pre-dated the standardization of C++, and was originally implemented in Ada is now an integral part of the standard and requirement for a compliant implementation of the same. Many other C++ libraries exist which are not part of the Standard, such as Boost. Also, non-Standard libraries written in C can generally be used by C++ programs.

Since 2004, the standards committee (includes Bjarne Stroustrup) has been busy working out the details of a new revision of the standard, that has been temporarily titled C++0x, due publication in the end of 2011. Some implementations already support some of the proposed alterations.

C++ source code example
// 'Hello World!' program 
 
#include <iostream>
 
int main()
{
  std::cout << "Hello World!" << std::endl;
  return 0;
}

Traditionally the first program people write in a new language is called "Hello World." because all it does is print the words Hello World. Hello World Explained offers a detailed explanation of this code; the included source code is to give you an idea of a simple C++ program.

Overview

Before you begin your journey to understand how to write programs using C++, it is important to understand a few key concepts that you may encounter. These concepts are not unique to C++, but are helpful to understanding computer programming in general. Readers who have experience in another programming language may wish to skim through or skip this section entirely.

There are many different kinds of programs in use today. From the operating system you use that makes sure everything works as it should, to the video games and music applications you use for fun, programs can fulfill many different purposes. What all programs (also called software or applications) have in common is that they all are made up of a sequence of instructions written in some form of programming language. These instructions tell a computer what to do, and generally how to do it. Programs can contain anything from instructions to solve math problems or send emails, to how to behave when a video game character is shot in a game. The computer will follow the instructions of a program one instruction at a time from start to finish.

Why learn C++?

Why not? This is the most clarifying approach to the decision to learn anything. Although learning is always good, selecting what you learn is more important as it is how you will prioritize tasks. Another side of this problem is that you will be investing some time in getting a new skill set. You must decide how will this benefit you. Check your objectives and compare similar projects or see what the programming market is in need of. In any case, the more programming languages you know, the better.

If you are approaching the learning process only to add another notch under your belt, that is, willing only to dedicate enough effort to understand its major quirks and learn something about its dark corners then you should be best served in learning first two other languages, this will clarify what makes C++ special in its approach to programming problems. You should select one imperative language, and in this C will probably have a better market value and will have a direct relation to C++ (a good substitute would be ASM) and the second language should be an Object Oriented language like Java for the same reasons, as there is a close relation between the three languages.

If you are willing to dedicate a more than passing interest in C++ then you can even learn C++ as your first language, but dedicate some time understanding the different paradigms and why C++ is a multi-paradigm language, or how some like to call it, a hybrid language.

Learning C is not a requirement for understanding C++, but knowing how to use an imperative language is, C++ will not make it easy for you to understand and distinguish some of this deeper concepts, since that in C++ you are free to implement solutions with a greater range of freedom. Understanding what options to make will become the cornerstone of mastering the language.

You should not learn C++ if you are only interested in applying or learning about Object Oriented Programing since the nomenclature used and some of the approaches C++ takes to the problem will probably increase the difficulty level in learning and mastering those concepts, if you are truly interested in Object Oriented programming, the best language for that is Smalltalk.

As with all languages C++ has a specific scope of application, where it can truly shine, and if we take a quick comparison with the previous mentioned languages, C++ is harder to learn than C and Java but more powerful than both. C++ enables you to abstract from the little things you have to deal with in C or other lower level languages but will grant you a bigger control and responsibility than Java, but it will not provide the default features you can obtain in similar higher level languages. You will have to search and examine several external implementations of these features and freely select those that best serve your purposes or you may even have to implement your own solution.

Where to get a compiler

When you select your compiler you must take in consideration your system OS, your personal preferences and the documentation that you can get on using it.

One of most actualized and compatible compilers is GCC. The next section will show how to get a copy and install it on Windows. You can easily find information on the GCC website on how to do it under another OS. GCC is a decent choice, and can be obtained for free. Many Open Source platforms include a recent GCC version. Version 4.0 or later gives fairly good conformance to the C++ standard. Various IDEs are available to support GCC. For Windows, Microsoft Visual Studio Express is currently available free of charge (but not free as in non-proprietary) with a C++ compiler that can be used from the command line or from the supplied IDE. An IDE, or Integrated Development Environment, is generally a graphical environment which integrates functionality like editing, compiling, linking, and usually a help system etc.).

NOTE:
In Appendix B:External References you will find references to other freely available compilers and even full IDEs you can use.

GCC

The GNU Compiler Collection is a free set of compilers developed by the Free Software Foundation, with Richard Stallman as one of the main architects.

There are many different pre-compiled GCC binaries on the Internet, some popular choices are listed below (with detailed steps for installation).

On Windows

Cygwin:

  1. Go to http://www.cygwin.com and click on the "Install Cygwin Now" button in the upper right corner of the page.
  2. Click "run" in the window that pops up, and click "next" several times, accepting all the default settings.
  3. Choose any of the Download sites ("ftp.easynet.be", etc.) when that window comes up; press "next" and the Cygwin installer should start downloading.
  4. When the "Select Packages" window appears, scroll down to the heading "Devel" and click on the "+" by it. In the list of packages that now displays, scroll down and find the "gcc-c++" package; this is the compiler. Click once on the word "Skip", and it should change to some number like "3.4" etc. (the version number), and an "X" will appear next to "gcc-core" and several other required packages that will now be downloaded.
  5. Click "next" and the compiler as well as the Cygwin tools should start downloading; this could take a while. While you're waiting, go to http://www.crimsoneditor.com and download that free programmer's editor; it's powerful yet easy to use for beginners.
  6. Once the Cygwin downloads are finished and you have clicked "next", etc. to finish the installation, double-click the Cygwin icon on your desktop to begin the Cygwin "command prompt". Your home directory will automatically be set up in the Cygwin folder, which now should be at "C:\cygwin" (the Cygwin folder is in some ways like a small Unix/Linux computer on your Windows machine -- not technically of course, but it may be helpful to think of it that way).
  7. Type "g++" at the Cygwin prompt and press "enter"; if "g++: no input files" or something like it appears you have succeeded and now have the gcc C++ compiler on your computer (and congratulations -- you have also just received your first error message!).

MinGW + DevCpp-IDE

  1. Go to http://www.bloodshed.net/devcpp.html, choose the version you want (eventually scrolling down), click on the appropriate download link! For the most current version, you will be redirected to http://www.bloodshed.net/dev/devcpp.html
  2. Scroll down to read the license and then to the download links. Download a version with Mingw/GCC. It's much easier than to do this assembling yourself. With a very short delay (only some days) you will always get the most current version of mingw packaged with the devcpp IDE. It's absolutely the same as with manual download of the required modules.
  3. You get an executable that can be executed at user level under any WinNT version. If you want it to be setup for all users, however, you need admin rights. It will install devcpp and mingw in folders of your wish.
  4. Start the IDE and experience your first project!
    You will find something mostly similar to MSVC, including menu and button placement. Of course, many things are somewhat different if you were familiar with the former, but it's as simple as a handfull of clicks to let your first program run.
For DOS

DJGPP:

  • Go to Delorie Software and download the GNU C++ compiler and other necessary tools. The site provides a Zip Picker in order to help identify which files you need, which is available from the main page.
  • Use unzip32 or other extraction utility to place files into the directory of your choice (ie. C:\DJGPP).
  • Set the envionment variables to configure DJGPP for compilation, by either adding lines to autoexec.bat or a custom batch file:
    set PATH=C:\DJGPP\BIN;%PATH%
    set DJGPP=C:\DJGPP\DJGPP.ENV
  • If you are running MS-DOS or Windows 3.1, you need to add a few lines to config.sys if they are not already present:
    shell=c:\dos\command.com c:\dos /e:2048 /p
    files=40
    fcbs=40,0

Note: The GNU C++ compiler under DJGPP is named gpp.

For Linux
  • For Redhat, get a gcc-c++ RPM, e.g. using Rpmfind and then install (as root) using rpm -ivh gcc-c++-version-release.arch.rpm
  • For Fedora Core, install the GCC C++ compiler (as root) by using yum install gcc-c++
  • For Mandrake, install the GCC C++ compiler (as root) by using urpmi gcc-c++
  • For Debian, install the GCC C++ compiler (as root) by using apt-get install g++
  • For Ubuntu, install the GCC C++ compiler by using sudo apt-get install g++
  • If you cannot become root, get the tarball from ftp://ftp.gnu.org/ and follow the instructions in it to compile and install in your home directory.
For Mac OS X

Xcode has GCC C++ compiler bundled. It can be invoked from the Terminal in the same way as Linux, but can also be compiled in one of XCode's projects.

What is a Programming Language?

In the most basic terms, a "programming language" is a means of communication between a human being (programmer) and a computer. A programmer uses this means of communication in order to give the computer instructions. These instructions are called "programs".

Like the many languages we use to communicate with each other, there are many languages that a programmer can use to communicate with a computer. Each language has its own set of words and rules, called semantics. If you're going to write a program, you have to follow the semantics of the language you're writing in, or you won't be understood.

Programming languages can basically be divided in to two categories: Low-Level and High-level, next we will introduce you to these concepts and their relevance to C++.

Low-level Languages

There are two general types of low level "languages".

Machine code (also called binary) is the lowest form of a low-level language. Machine code consists of a string of 0s and 1s, which combine to form meaningful instructions that computers can take action on. If you look at a page of binary it becomes apparent why binary is never a practical choice for writing programs; what kind of person would actually be able to remember what a bunch of strings of 1 and 0 mean ?

Assembly language (also called ASM), is just above machine code on the scale from low level to high level. It is a human-readable translation of the machine language instructions the computer executes. For example, instead of referring to processor instructions by their binary representation (0s and 1s), the programmer refers to those instructions using a more memorable (mnemonic) form. These mnemonics are usually short collections of letters that symbolize the action of the respective instruction, such as "ADD" for addition, and "MOV" for moving values from one place to another.

NOTE:
Assembly language is processor specific. This means that a program written in assembly language will not work on computers with different processor architectures.

You do not have to understand assembly language to program in C++, but it does help to have an idea of what's going on "behind-the-scenes". Learning about assembly language will also allow you to have more control as a programmer and help you in debugging and understanding code.

High-level Languages

Higher level languages partially solve the problem of abstraction to the hardware (CPU, co-processors, number of registers etc...) by providing portability of code. High-level languages do more with less code, although there is sometimes a loss in performance and less freedom for the programmer. They also attempt to use English language words in a form which can be read and generally interpreted by the average person with little experience in them. A program written in one of these languages is sometimes referred to as "human-readable code". In general, more abstraction makes it easier for a language be learned. No programming language is written in what one might call "plain English" though, (although BASIC comes close.) Because of this, the text of a program is sometimes referred to as "code", or more specifically as "source code." This is discussed in more detail in the Code Section of the book.

Keep in mind that this classification scheme is evolving. C++ is still considered a high-level language, but with the appearance of newer languages (Java, C#, Ruby etc...), C++ is beginning to be grouped with lower level languages like C.

Translating Programming Languages

Since a computer is only capable of understanding machine code, human-readable code must be either interpreted or translated into machine code.

An interpreter is a program (often written in a lower level language) that interprets the instructions of a program one instruction at a time into commands that are to be carried out by the interpreter as it happens. Typically each instruction consists of one line of text or provides some other clear means of telling each instruction apart and the program must be reinterpreted again each time the program is run.

A compiler is a program that translates the instruction of a program one instruction at a time into machine code. The translation into machine code may involve splitting one instruction understood by the compiler into multiple machine instructions. The instructions are only translated once and after that the machine can understand and follow the instructions directly whenever it is instructed to do so. A complete examination is given on the Compiler Section of the book.

The words and statements used to instruct the computer may differ, but no matter what words and statements are used, just about every programming language will include statements that will accomplish the following:

Input
Input is the act of getting information from a device such as a keyboard or mouse, or sometimes another program.
Output
Output is the opposite of input; it gives information to the computer monitor or another device or program.
Math/Algorithm
All computer processors (the brain of the computer), have the ability to perform basic mathematical computation, and every programming language has some way of telling it to do so.
Testing
Testing involves telling the computer to check for a certain condition and to do something when that condition is true or false. Conditionals are one of the most important concepts in programming, and all languages have some method of testing conditions.
Repetition
Perform some action repeatedly, usually with some variation.

An further examination is provided on the Statements Section of the book.

Believe it or not, that's pretty much all there is to it. Every program you've ever used, no matter how complicated, is made up of functions that look more or less like these. Thus, one way to describe programming is the process of breaking a large, complex task up into smaller and smaller subtasks until eventually the subtasks are simple enough to be performed with one of these simple functions.

C++ is mostly compiled rather than interpreted (there are some C++ interpreters), and then "executed" later. As complicated as this may seem, later you will see how easy it really is.

So as we have seen in the Introducing C++ Section, C++ evolved from C by adding some levels of abstraction (so we can correctly state that C++ is of a higher level than C). We will learn the particulars of those differences in the Programming Paradigms Section of the book and for some of you that already know some other languages should look into Programming Languages Comparisons Section.

Programming Paradigms

A programming paradigm is a style or model of programming that affects the way programmers can design, organize and write programs. A multi-paradigm programming language allows programmers to choose from a number of different programming paradigms. C++ is a multi-paradigm programming language.

Procedural Programming

Procedural programming can be defined as a subtype of imperative programming as a programming paradigm based upon the concept of procedure calls, in which statements are structured into procedures (also known as subroutines or functions). Procedure calls are modular and are bound by scope. A procedural program is composed of one or more modules. Each module is composed of one or more subprograms. Modules may consist of procedures, functions, subroutines or methods, depending on the programming language. Procedural programs may possibly have multiple levels or scopes, with subprograms defined inside other subprograms. Each scope can contain names which cannot be seen in outer scopes.

Procedural programming offers many benefits over simple sequential programming since procedural code:

  • is easier to read and more maintainable
  • is more flexible
  • facilitates the practice of good program design
  • allows modules to be reused in the form of code libraries.

Object-Oriented Programming

Object-oriented programming can be seen as an extension of procedural programming in which programs are made up of collection of individual units called objects that have a distinct purpose and function with limited or no dependencies on implementation. For example, a car is like an object; it gets you from point A to point B with no need to know what type of engine the car uses or how the engine works. Object-oriented languages usually provide a means of documenting what an object can and cannot do, like instructions for driving a car.

Objects and Classes

An object is composed of members and methods. The members (also called data members, characteristics, attributes, or properties) describe the object. The methods generally describe the actions associated with a particular object. Think of an object as a noun, its members as adjectives describing that noun, and its methods as the verbs that can be performed by or on that noun.

For example, a sports car is an object. Some of its members might be its height, weight, acceleration, and speed. An object's members just hold data about that object. Some of the methods of the sports car could be "drive", "park", "race", etc. The methods really don't mean much unless associated with the sports car, and the same goes for the members.

The blueprint that lets us build our sports car object is called a class. A class doesn't tell us how fast our sports car goes, or what color it is, but it does tell us that our sports car will have a member representing speed and color, and that they will be say, a number and a word, respectively. The class also lays out the methods for us, telling the car how to park and drive, but these methods can't take any action with just the blueprint - they need an object to have an effect.

Encapsulation

Encapsulation, the principle of information hiding (from the user), is the process of hiding the data structures of the class and allowing changes in the data through a public interface where the incoming values are checked for validity, and so not only it permits the hiding of data in an object but also of behavior. This prevents clients of an interface from depending on those parts of the implementation that are likely to change in future, thereby allowing those changes to be made more easily, that is, without changes to clients. In modern programming languages, the principle of information hiding manifests itself in a number of ways, including encapsulation and polymorphism.

Inheritance

This concept describes a relationship between two (or more) types, or classes, of objects in which one is said to be a "subtype" or "child" of the other, as result the "child" object is said to inherit features of the parent, allowing for shared functionality, this lets programmers re-use or reduce code and simplifies the development and maintenance of software.

Inheritance is also commonly held to include subtyping, whereby one type of object is defined to be a more specialized version of another type (see Liskov substitution principle), though non sub-typing inheritance is also possible.

Inheritance is typically expressed by describing classes of objects arranged in an inheritance hierarchy reflecting common behavior.

For example, one might create a variable class "Mammal" with features such as eating, reproducing, etc.; then define a subtype "Cat" that inherits those features without having to explicitly program them, while adding new features like "chasing mice". This allows commonalities among different kinds of objects to be expressed once and reused multiple times.

In C++ we can then have classes which are related to other classes (a class can be defined by means of an older, pre-existing, class ). This leads to a situation in which a new class has all the functionality of the older class, and additionally introduces its own specific functionality. Instead of composition, where a given class contains another class, we mean here derivation, where a given class is another class.

This OOP property will be explained further when we talk about Classes (and Structures) inheritance in the Classes Inheritance Section of the book.

If one wants to use more than one totally orthogonal hierarchy simultaneously, such as allowing "Cat" to inherit from "Cartoon character" and "Pet" as well as "Mammal" we are using multiple inheritance.

Multiple Inheritance

Multiple inheritance is the process by which one class can inherit the properties of two or more classes (variously known as its base classes, or parent classes, or ancestor classes, or super classes).

In some similar language, multiple inheritance is restricted in various ways to keep the language simple, such as by allowing inheritance from only one real class and a number of "interfaces", or by completely disallowing multiple inheritance. C++ places the full power of multiple inheritance in the hands of programmers, but it is needed only rarely, and (as with most techniques) can complicate code if used inappropriately. Because of C++'s approach to multiple inheritance, C++ has no need of separate language facilities for "interfaces"; C++'s classes can do everything that interfaces do in some related languages.

Polymorphism

Polymorphism allows a single name to be reused for several related but different purposes. The purpose of polymorphism is to allow one name to be used for a general class. Depending on the type of data, a specific instance of the general case is executed.

The concept of polymorphism is wider. Polymorphism exists every time we use two functions that have the same name, but differ in the implementation. They may also differ in their interface, e.g., by taking different arguments. In that case the choice of which function to make is via overload resolution, and is performed at compile time, so we refer to static polymorphism.

Dynamic polymorphism will be covered deeply in the Classes Section where we will address its use on redefining the method in the derived class.

Generic Programming

Generic programming or polymorphism is a programming style that emphasizes techniques that allow one value to take on different types as long as certain contracts such as subtypes and signature are kept. In simpler terms generic programming is based in finding the most abstract representations of efficient algorithms. Templates popularized the notion of generics. Templates allow code to be written without consideration of the type with which it will eventually be used. Templates are defined in the Standard Template Library (STL), where generic programming was introduced into C++.

Statically Typed

Typing refers to how a computer language handles its variables. Variables are values that the program uses during execution. These values can change; they are variable, hence their name. Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact types that are in use, it can produce machine code that does the right thing easier. In C++, variables need to be defined before they are used so that compilers know what type they are, and hence is statically typed. Languages that are not statically typed are called dynamically typed.

Static typing usually finds type errors more reliably at compile time, increasing the reliability of compiled programs. Simply put, it means that "A round peg won't fit in a square hole", so the compiler will report it when a type leads to ambiguity or incompatible usage. However, programmers disagree over how common type errors are and what proportion of bugs that are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been type checked, while dynamic typing advocates point to distributed code that has proved reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the type system is increased.

A statically typed system constrains the use of powerful language constructs more than it constrains less powerful ones. This makes powerful constructs harder to use, and thus places the burden of choosing the "right tool for the problem" on the shoulders of the programmer, who might otherwise be inclined to use the most powerful tool available. Choosing overly powerful tools may cause additional performance, reliability or correctness problems, because there are theoretical limits on the properties that can be expected from powerful language constructs. For example, indiscriminate use of recursion or global variables may cause well-documented adverse effects.

Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be used as an additional mechanism for communicating the intentions of the library developer.

Free-form

Free-form refers to how the programmer crafts the code. Basically, there are no rules on how you choose to write your program, save for the semantic rules of C++. Any C++ program should compile as long as it is legal C++.

The free-form nature of C++ is used (or abused, depending on your point of view) by some programmers in crafting obfuscated C++ (C++ that is purposefully written to be difficult to understand). The use of obfuscation is regarded by some as a security device, ensuring that the source code can is harder to analyzed by the average user.

Language Comparisons

There isn't a perfect language. It all depends on the tools and the objective. The optimal language (in terms of run-time performance) is machine code but machine code (binary) is the least efficient programming language in terms of coder time. The complexity of writing large systems is enormous with high-level languages, and beyond human capabilities with machine code. In the next section C++ will be compared with other closely related languages like C, Java, C# and C++/CLI.

The quote above is shown to indicate that no programming language at present can translate directly concepts or ideas into useful code, there are solutions that will help. We will cover the use of Computer-aided software engineering (CASE) tools that will address part of this problem but its use does require planning and some degree of complexity.

The intention of these sections is not to promote one language above another; each has its applicability. Some are better in specific tasks, some are simpler to learn, others only provide a better level of control to the programmer. This all may depend also on the level of control the programmer has of a given language.

Garbage Collection

In C++ garbage collection is optional rather than required. In the Garbage Collection Section of this book we will cover this issue deeply.

Why doesn't C++ include a finally keyword?

As we will see in the Resource Acquisition Is Initialization (RAII) Section of the book, RAII can be used to provide a better solution for most issues. When finally is used to clean up, it has to be written by the clients of a class each time that class is used (for example, clients of a File class have to do I/O in a try/catch/finally block so that they can guarantee that the File is closed). With RAII, the destructor of the File class can make that guarantee. Now the cleanup code has to be coded only once — in the destructor of File; the users of the class don't need to do anything.

TODO

TODO
Split this explanation to RAII and only provide the reference

Mixing Languages

TODO

TODO
Add relevant information

By default, C++ compilers normally "mangle" the names of functions in order to facilitate function overloading, and generic functions. In some cases, you need to gain access to a function that wasn't created in a C++ compiler. For this to occur, you need to declare a function as external:

extern "C" void LibraryFunction();

C 89/99

C was essentially the core language of C++ when Bjarne Stroustrup, decided to create a "better C". Many of the syntax conventions and rules still hold true and so we can even state that C was a subset of C++, most recent C++ compilers will also compile C code taking into consideration the small incompatibilities, since C99 and C++ 2003 are not compatible any more. You can also check more information about the C language on the C Programming Wikibooks ( http://en.wikibooks.org/wiki/C ).

C++ as defined by the ANSI standard in 98 (called C++98 at times) is very nearly, but not quite, a superset of the C language as it was defined by its first ANSI standard in 1989 (known as C89). There are a number of ways in which C++ is not a strict superset, in the sense that not all valid C89 programs are valid C++ programs, but the process of converting C code to valid C++ code is fairly trivial (avoiding reserved words, getting around the stricter C++ type checking with casts, declaring every called function, and so on).

In 1999, C was revised and many new features were added to it. As of 2004, most of these new "C99" features are not there in C++. Some (including Stroustrup himself) have argued that the changes brought about in C99 have a philosophy distinct from what C++98 adds to C89, and hence these C99 changes are directed towards increasing incompatibility between C and C++.

The merging of the languages seems a dead issue as coordinated actions by the C and C++ standards committees leading to a practical result didn't happen and it can be said that the languages started even to diverge.

Some of the differences are:

  • C++ supports function overloading (absent in C89, allowed only for some standard library code in C99).
  • C++ supports inheritance and polymorphism.
  • C++ adds keyword class, but keeps struct from C, with compatible semantics.
  • C++ supports access control for class members.
  • C++ supports generic programming through the use of templates.
  • C++ extends the C89 standard library with its own standard library.
  • C++ and C99 offer different complex number facilities.
  • C++ has bool and wchar_t as primitive types, while typedefs in C.
  • C++ comparison operators return bool, while C returns int.
  • C++ supports overloading of operators.
  • C++ character constants have type char, while C character constants have type int.
  • C++ has additional cast operators (static_cast, dynamic_cast, const_cast and reinterpret_cast).
  • C++ adds mutable keyword to address the imperfect match between physical and logical constness.
  • C++ extends the type system with references.
  • C++ supports member functions, constructors and destructors for user-defined types to establish invariants and to manage resources.
  • C++ supports runtime type identification (RTTI), via typeid and dynamic_cast.
  • C++ includes exception handling.
  • C++ has std::vector as part of its standard library instead of variable-length arrays as in C.
  • C++ treats sizeof operator as compile time operation, while C allows it be a runtime operation.
  • C++ has new and delete operators, while C uses malloc and free library functions exclusively.
  • C++ supports object-oriented programming without extensions.
  • C++ does not require use of macros and careful information-hiding and abstraction for code portability.
  • C++ supports per-line comments denoted by //. (C99 started official support for this comment system, and most compilers supported this as an extension.)
Reasons to chose one of the languages over the other

It is not uncommon to find someone defending C over C++ or vice versa or complain about some of those languages features. There is no scientific evidence to put most languages above another in general terms, the only reasons that do have some traction is if the language is still very recent and prone to deep changes or as yet unknown bugs, in the case of C or C++ this is not the case both languages are very mature even if both are still evolving, the new features keep an high level of compatibility with old code, making the use of those new constructs a programmer's decision. It is not uncommon to establish rules in a project to limit the use of parts of a language (such as RTTI, exceptions, or virtual-functions in inner loops), due the proficiency of the programmers or the needs of the project, as it is also common for new hardware to support lower level languages first. Due to C being less extensive and lower level than C++, it is easier to check and comply with strict industry guidelines and automate those steps. Another benefit is that it is easier for the programmer to do low level optimizations, even if most C++ compilers can guarantee near perfect optimizations automatically, a human can still do more and C has less complex structures.

Any of the valid reasons to choose a language over another is mostly due to programmers choice that indirectly deals with choosing the best tool for the job and having the resources needed to complete it. It would be hard to validate selecting C++ for a project if the available programmers only knew C or even the reverse, it is somewhat expected for a C++ programmer to produce functional C code, but the mindset and experience needed aren't the same, the same rational is valid for C programmers and ASM, this is due to the close relations that exist in the language structure and historic evolution.

One could argue that using the C subset of C++, in a C++ compiler, is the same as using C but in reality we find that it will generate slightly different results depending on the compiler used.

Java

This is a comparison of the Java programming language with the C++ programming language. C++ and Java share many common traits. You can get a better understanding of Java in the Java Programming WikiBook.

Java was created initially to support network computing on embedded systems. Java was designed to be extremely portable, secure, multi-threaded and distributed, none of which were design goals for C++. The syntax of Java was chosen to be familiar to C programmers, but direct compatibility with C was not maintained. Java also was specifically designed to be simpler than C++ but it keeps evolving above that simplification.

C++ Java
backwards compatible with C backwards compatibility with previous versions
execution efficiency developer productivity
trusts the programmer restrains the programmer's abilities
arbitrary memory access possible memory access only through objects
concise expression explicit operation
can arbitrarily override types type safety
procedural or object-oriented object-oriented
operator overloading meaning of operators immutable
powerful capabilities of language feature-rich, easy to use standard library

Differences between C++ and Java are:

  • C++ parsing is somewhat more complicated than with Java; for example, Foo<1>(3); is a sequence of comparisons if Foo is a variable, but it creates an object if Foo is the name of a class template.
  • C++ allows namespace level constants, variables, and functions. All such Java declarations must be inside a class or interface.
  • const in C++ indicates data to be 'read-only,' and is applied to types. final in java indicates that the variable is not to be reassigned. For basic types such as const int vs final int these are identical, but for complex classes, they are different.
  • C++ doesn't support constructor delegation.
  • C++ runs on the hardware, Java runs on a virtual machine so with C++ you have greater power at the cost of portability.
  • C++, int main() is a function by itself, without a class.
  • C++ access specification (public, private) is done with labels and in groups.
  • C++ access to class members default to private, in Java it is package access.
  • C++ classes declarations end in a semicolon.
  • C++ lacks language level support for garbage collection while Java has built-in garbage collection to handle memory deallocation.
  • C++ supports goto statements; Java does not, but its labeled break and labeled continue statements provide some structured goto-like functionality. In fact, Java enforces structured control flow, with the goal of code being easier to understand.
  • C++ provides some low-level features which Java lacks. In C++, pointers can be used to manipulate specific memory locations, a task necessary for writing low-level operating system components. Similarly, many C++ compilers support inline assembler. In Java, assembly code can still be accessed as libraries, through the Java Native Interface. However, there is significant overhead for each call.
  • C++ allows a range of implicit conversions between native types, and also allows the programmer to define implicit conversions involving compound types. However, Java only permits widening conversions between native types to be implicit; any other conversions require explicit cast syntax.
    • A consequence of this is that although loop conditions (if, while and the exit condition in for) in Java and C++ both expect a boolean expression, code such as if(a = 5) will cause a compile error in Java because there is no implicit narrowing conversion from int to boolean. This is handy if the code were a typo for if(a == 5), but the need for an explicit cast can add verbosity when statements such as if (x) are translated from Java to C++.
  • For passing parameters to functions, C++ supports both true pass-by-reference and pass-by-value. As in C, the programmer can simulate by-reference parameters with by-value parameters and indirection. In Java, all parameters are passed by value, but object (non-primitive) parameters are reference values, meaning indirection is built-in.
  • Generally, Java built-in types are of a specified size and range; whereas C++ types have a variety of possible sizes, ranges and representations, which may even change between different versions of the same compiler, or be configurable via compiler switches.
    • In particular, Java characters are 16-bit Unicode characters, and strings are composed of a sequence of such characters. C++ offers both narrow and wide characters, but the actual size of each is platform dependent, as is the character set used. Strings can be formed from either type.
  • The rounding and precision of floating point values and operations in C++ is platform dependent. Java provides a strict floating-point model that guarantees consistent results across platforms, though normally a more lenient mode of operation is used to allow optimal floating-point performance.
  • In C++, pointers can be manipulated directly as memory address values. Java does not have pointers—it only has object references and array references, neither of which allow direct access to memory addresses. In C++ one can construct pointers to pointers, while Java references only access objects.
  • In C++ pointers can point to functions or member functions (function pointers or functors). The equivalent mechanism in Java uses object or interface references.
  • C++ features programmer-defined operator overloading. The only overloaded operators in Java are the "+" and "+=" operators, which concatenate strings as well as performing addition.
  • Java features standard API support for reflection and dynamic loading of arbitrary new code.
  • Java has generics. C++ has templates.
  • Both Java and C++ distinguish between native types (these are also known as "fundamental" or "built-in" types) and user-defined types (these are also known as "compound" types). In Java, native types have value semantics only, and compound types have reference semantics only. In C++ all types have value semantics, but a reference can be created to any object, which will allow the object to be manipulated via reference semantics.
  • C++ supports multiple inheritance of arbitrary classes. Java supports multiple inheritance of types, but only single inheritance of implementation. In Java, a class can derive from only one class, but a class can implement multiple interfaces.
  • Java explicitly distinguishes between interfaces and classes. In C++ multiple inheritance and pure virtual functions makes it possible to define classes that function just as Java interfaces do.
  • Java has both language and standard library support for multi-threading. The synchronized keyword in Java provides simple and secure mutex locks to support multi-threaded applications. While mutex lock mechanisms are available through libraries in C++, the lack of language semantics makes writing thread safe code more difficult and error prone.
Memory management
  • Java requires automatic garbage collection. Memory management in C++ is usually done by hand, or through smart pointers. The C++ standard permits garbage collection, but does not require it; garbage collection is rarely used in practice. When permitted to relocate objects, modern garbage collectors can improve overall application space and time efficiency over using explicit deallocation.
  • C++ can allocate arbitrary blocks of memory. Java only allocates memory through object instantiation. (Note that in Java, the programmer can simulate allocation of arbitrary memory blocks by creating an array of bytes. Still, Java arrays are objects.)
  • Java and C++ use different idioms for resource management. Java relies mainly on garbage collection, while C++ relies mainly on the RAII (Resource Acquisition Is Initialization) idiom. This is reflected in several differences between the two languages:
    • In C++ it is common to allocate objects of compound types as local stack-bound variables which are destructed when they go out of scope. In Java compound types are always allocated on the heap and collected by the garbage collector (except in virtual machines that use escape analysis to convert heap allocations to stack allocations).
    • C++ has destructors, while Java has finalizers. Both are invoked prior to an object's deallocation, but they differ significantly. A C++ object's destructor must be implicitly (in the case of stack-bound variables) or explicitly invoked to deallocate the object. The destructor executes synchronously at the point in the program at which the object is deallocated. Synchronous, coordinated uninitialization and deallocation in C++ thus satisfy the RAII idiom. In Java, object deallocation is implicitly handled by the garbage collector. A Java object's finalizer is invoked asynchronously some time after it has been accessed for the last time and before it is actually deallocated, which may never happen. Very few objects require finalizers; a finalizer is only required by objects that must guarantee some clean up of the object state prior to deallocation—typically releasing resources external to the JVM. In Java safe synchronous deallocation of resources is performed using the try/finally construct.
    • In C++ it is possible to have a dangling pointer – a reference to an object that has been destructed; attempting to use a dangling pointer typically results in program failure. In Java, the garbage collector won't destruct a referenced object.
    • In C++ it is possible to have an object that is allocated, but unreachable. An unreachable object is one that has no reachable references to it. An unreachable object cannot be destructed (deallocated), and results in a memory leak. By contrast, in Java an object will not be deallocated by the garbage collector until it becomes unreachable (by the user program). (Note: weak references are supported, which work with the Java garbage collector to allow for different strengths of reachability.) Garbage collection in Java prevents many memory leaks, but leaks are still possible under some circumstances.
Libraries
  • C++ standard library only provides components that are relatively general purpose, such as strings, containers, and I/O streams. Java has a considerably larger standard library. This additional functionality is available for C++ by (often free) third party libraries, but third party libraries do not provide the same ubiquitous cross-platform functionality as standard libraries.
  • C++ is mostly backward compatible with C, and C libraries (such as the APIs of most operating systems) are directly accessible from C++. In Java, the richer functionality of the standard library is that it provides cross-platform access to many features typically available in platform-specific libraries. Direct access from Java to native operating system and hardware functions requires the use of the Java Native Interface.
Runtime
  • C++ is normally compiled directly to machine code which is then executed directly by the operating system. Java is normally compiled to byte-code which the Java virtual machine (JVM) then either interprets or JIT compiles to machine code and then executes.
  • Due to the lack of constraints in the use of some C++ language features (e.g. unchecked array access, raw pointers), programming errors can lead to low-level buffer overflows, page faults, and segmentation faults. The Standard Template Library, however, provides higher-level abstractions (like vector, list and map) to help avoid such errors. In Java, such errors either simply cannot occur or are detected by the JVM and reported to the application in the form of an exception.
  • In Java, bounds checking is implicitly performed for all array access operations. In C++, array access operations on native arrays are not bounds-checked, and bounds checking for random-access element access on standard library collections like std::vector and std::deque is optional.
Miscellaneous
  • Java and C++ use different techniques for splitting up code in multiple source files. Java uses a package system that dictates the file name and path for all program definitions. In Java, the compiler imports the executable class files. C++ uses a header file source code inclusion system for sharing declarations between source files. (See Comparison of imports and includes.)
  • Templates and macros in C++, including those in the standard library, can result in duplication of similar code after compilation. Second, dynamic linking with standard libraries eliminates binding the libraries at compile time.
  • C++ compilation features a textual preprocessing phase, while Java does not. Java supports many optimizations that mitigate the need for a preprocessor, but some users add a preprocessing phase to their build process for better support of conditional compilation.
  • In Java, arrays are container objects which you can inspect the length of at any time. In both languages, arrays have a fixed size. Further, C++ programmers often refer to an array only by a pointer to its first element, from which they cannot retrieve the array size. However, C++ and Java both provide container classes (std::vector and java.util.ArrayList respectively) which are resizable and store their size.
  • Java's division and modulus operators are well defined to truncate to zero. C++ does not specify whether or not these operators truncate to zero or "truncate to -infinity". -3/2 will always be -1 in Java, but a C++ compiler may return either -1 or -2, depending on the platform. C99 defines division in the same fashion as Java. Both languages guarantee that (a/b)*b + (a%b) == a for all a and b (b != 0). The C++ version will sometimes be faster, as it is allowed to pick whichever truncation mode is native to the processor.
  • The sizes of integer types is defined in Java (int is 32-bit, long is 64-bit), while in C++ the size of integers and pointers is compiler-dependent. Thus, carefully-written C++ code can take advantage of the 64-bit processor's capabilities while still functioning properly on 32-bit processors. However, C++ programs written without concern for a processor's word size may fail to function properly with some compilers. In contrast, Java's fixed integer sizes mean that programmers need not concern themselves with varying integer sizes, and programs will run exactly the same. This may incur a performance penalty since Java code cannot run using an arbitrary processor's word size.
Performance

Computing performance is a measure of resource consumption when a system of hardware and software performs a piece of computing work such as an algorithm or a transaction. Higher performance is defined to be 'using fewer resources'. Resources of interest include memory, bandwidth, persistent storage and CPU cycles. Because of the high availability of all but the latter on modern desktop and server systems, performance is colloquially taken to mean the least CPU cycles; which often converts directly into the least wall clock time. Comparing the performance of two software languages requires a fixed hardware platform and (often relative) measurements of two or more software subsystems. This section compares the relative computing performance of C++ and Java on common operating systems such as Windows and Linux.

Early versions of Java were significantly outperformed by statically compiled languages such as C++. This is because the program statements of these two closely related Level 6 languages may compile to a few machine instructions with C++, while compiling into several byte codes involving several machine instructions each when interpreted by a Java JVM. For example:

Java/C++ statement C++ generated code Java generated byte code
vector[i]++; mov edx,[ebp+4h]

mov eax,[ebp+1Ch]
inc dword ptr [edx+eax*4]

aload_1

iload_2
dup2
iaload
iconst_1
iadd
iastore

While this may still be the case for embedded systems because of the requirement for a small footprint, advances in just in time (JIT) compiler technology for long-running server and desktop Java processes has closed the performance gap and in some cases given the performance advantage to Java. In effect, Java byte code is compiled into machine instructions at run time, in a similar manner to C++ static compilation, resulting in similar instruction sequences.

C++ is still faster in most operations than Java at the moment, even at low-level and numeric computation. For in-depth information you could check Performance of Java versus C++. It's a bit pro-Java but very detailed.

C#

C# (pronounced "See Sharp") is a multi-purpose computer programming language suitable for all development needs. There is a WikiBook (http://en.wikibooks.org/wiki/C_sharp) that introduces C# language fundamentals and covers a variety of the base class libraries (BCL) provided by the Microsoft .NET Framework.

C# is very similar to Java in that it takes the basic operators and style of C++ but forces programs to be type safe, in that it executes the code in a controlled sandbox called the virtual machine. As such, all code must be encapsulated inside an object, among other things. C# provides many additions to facilitate interaction with Microsoft's Windows, COM, and Visual Basic.

There are several shortcomings to C++ which are resolved in C#. One of the more subtle ones is the use of reference variables as function arguments. When a code maintainer is looking at C++ source code, if a called function is declared in a header somewhere, the immediate code does not provide any indication that an argument to a function is passed as a reference. An argument passed by reference could be changed after calling the function whereas an argument passed by value cannot be changed. A maintainer not be familiar with the function looking for the location of an unexpected value change of a variable would additionally need to examine the header file for the function in order to determine whether or not that function could have changed the value of the variable. C# insists that the ref keyword be placed in the function call (in addition to the function declaration), thereby cluing the maintainer in that the value could be changed by the function.

TODO

TODO
Should refer MS & MONO and Portable.NET http://getdotgnu.com/pnet, as well as the ECMA and ISO standards for C# and CLI

Managed C++ (C++/CLI)

Managed C++ is a shorthand notation for Managed Extensions for C++, which are part of the .NET framework from Microsoft. This extension of the C++ language was developed to add functionality like automatic garbage collection and heap management, automatic initialization of arrays, and support for multidimensional arrays, simplifying all those details of programming in C++ that would otherwise have to be done by the programmer.

Managed C++ is not compiled to machine code. Rather, it is compiled to Common Intermediate Language, which is an object-oriented machine language and was formerly known as MSIL

D

The D programming language, also known simply as D, was developed in-house by Digital Mars, the language specification and D compiler (Windows and Linux) is freely distributed for free on their web site. Only the compiler front-end is licensed under both the Artistic License and the GNU GPL; sources for the front-end are distributed along with the compiler binaries. The compiler back-end is proprietary.

Digital Mars is a small US software company, also known for producing a C compiler (known over time as Datalight C compiler, then Zorland C and then Zortech C) and a C++ compiler (known as Zortech C++ that is attribute to be the first C++ compiler for Windows, renamed later as Symantec C++ and now Digital Mars C++ (DMC++) and associated to utilities such as an IDE for Windows (supporting the MFC library).

D originated as a re-engineering of C++, but even though it is predominantly influenced by that language, it is also a multi-paradigm language, it is not a variant of it. D has redesigned some C++ features and has been influenced by concepts used in other programming languages, such as Java, C# and Eiffel.

Wikibooks has a D Programming book, in this section we will point out some features were D is distinct from C++:

  • D does not support multiple inheritance.
TODO

TODO
Complete with distinctions from C++ to D

Chapter Summary

  1. Introducing C++ Development stage: 100% (as of Dec 11, 2006)
  2. Programming languages Development stage: 100% (as of Dec 11, 2006)
    1. Programming paradigms Development stage: 75% (as of Dec 11, 2006) - the versatility of C++ as a multi-paradigm language, concepts of Object-Oriented Programming (Objects and Classes, Inheritance, Polymorphism).
  3. Comparisons Development stage: 75% (as of Sep 20, 2005) - to other languages, relation to other computer science constructs and idioms.
    1. with C Development stage: 75% (as of Sep 20, 2005)
    2. with Java Development stage: 75% (as of Sep 20, 2005)
    3. with C# Development stage: 25% (as of Sep 20, 2005)
    4. with Managed C++ (C++/CLI) Development stage: 25% (as of Sep  20, 2005)
    5. with D Development stage: 25% (as of {{{2}}})

Contents

Fundamentals

The Code

The task of programming, while not easy in its execution, is actually fairly simple in its goals. A programmer will envision, or be tasked with, a specific goal. Goals are usually provided in the form of "I want a program that will perform...fill in the blank..." The job of the programmer then is to come up with a "working model" (a model that may consist of one or more algorithms). That "working model" is sort of an idea of how a program will accomplish the goal set out for it. It gives a programmer an idea of what to write in order to turn the idea in to a working program. Once the programmer has an idea of the structure their program will need to take in order to accomplish the goal, they set about actually writing the program itself with all of the proper commands, functions and syntax. The code that they write is what actually implements the program, or causes it to perform the necessary task, and for that reason, it is sometimes called "implementation code".

How the instructions of a program are written out and stored is generally not a concept determined by a programming language. Punch cards used to be in common use, however under most modern operating systems the instructions are commonly saved as plain text files that can be edited with any text editor. These files are the source of the instructions that make up a program and so are sometimes referred to as source files but a more exclusive definition is source code.

When referring to source code or just source, you are considering only the files that contain code, the actual text that makes up the functions (actions) for computer to execute. By referring to source files you are extending the idea to not only the files with the instructions that make up the program but all the raw files resources that together can build the program.

Source code

Source code is the halfway point between human language and machine code. As mentioned before, it can be read by people to an extent, but it can also be parsed (converted) into machine code by a computer. The machine code is the strings of 1's and 0's that the computer can fully understand and act on.

In a small program, you might have as little as a few dozen lines of code at the most, whereas in larger programs, this number might stretch into the thousands or even millions. For this reason, it is sometimes more practical to split large amounts of code across many files. This makes it easier to read, as you can do it bit by bit, and it also reduces compile time of each source file. It takes much less time to compile a lot of small source files than it does to compile a single massive source file.

Managing size is not the only reason to split code, though. Often, especially when a piece of software is being developed by a large team, source code is split. Instead of one massive file, the program is divided into separate files, and each individual file contains the code to perform one particular set of tasks for the overall program. This creates a condition known as Modularity. Modularity is a quality that allows source code to be changed, added to, or removed a piece at a time. This has the advantage of allowing many people to work on separate aspects of the same program, thereby allowing it to move faster and more smoothly. Source code for a large project should always be written with modularity in mind. Even when working with small or medium sized projects, it is good to get in the habit of writing code with ease of editing and use in mind.

C++ source code is case sensitive. This means that it distinguishes between lowercase and capital letters, so that it sees the words "hello," "Hello," and "HeLlO" as being totally different things. This is important to remember and understand, it will be discussed further in the Coding Style Section.

File Organization

Most operating systems require C++ files to be designated by a name followed by a specific extension. The C++ standard doesn't impose any specific rules on how files are named or organized.

The specific conventions for the file organizations has both technical reasons and organizational benefits, very similar to the code style conventions we will examine later. Most of the conventions governing files derive from historical preferences and practices, that are especially related with lower level languages that preceded C++. This is especially true when we take into consideration that C++ was built over the C89 ANSI standard, with compatibility in mind so most practices have remained mostly static, except for the operating systems improved support for files and greater ease of management of file resources.

One of the evolutions when dealing with filenames on the language standard was that the default include files would have no extension. Most implementations still provide the old C style headers that use C's file extension ".h" for the C Standard Library, but C++-specific header filenames that were terminated in the same fashion now have no extension (e.g. iostream.h is now iostream). This change to old C++ headers was simultaneous with the implementation of namespaces, in particular the std namespace.

NOTE:
Please note that file names and extensions don't include quotes; the quotes were added for clarity in this text.

File Names

TODO

TODO
Add reference to over .cpp .h, common rules to file naming and code distribution.

Do not reuse a standard header file name

As you will see later, the C++ Standard defines a list of headers. The behavior is undefined if a file with the same name as a standard header is placed in the search path for included source files.

Extensions

The extension serves one purpose: to indicate to the Operating System, the IDE or the compiler what resides within the file. By itself an extension will not serve as a guarantee for the content.

Since the C language sources usually have the extension ".c" and ".h", in the beginning it was common for C++ source files to share the same extensions or use a distinct variation to clearly indicate the C++ code file. Today this is the practice, most C++ implementation files will use the ".cpp" extension and ".h" for the declaration or header files (the last one is still shared across most assembler and C compilers).

There are other common extensions variations, such as, ".cc", ".C", ".cxx", and ".c++" for "implementation" code. For header files, the same extension variations are used, but the first letter of the extension is usually replaced with an "h" as in, ".hh", ".H", ".hxx", "hpp", ".h++" etc...

Header files will be discussed with more detail later in the Preprocessor Section when introducing the #include directive and the standard headers, but in general terms a header file is a special kind of source code file that is included (by the preprocessor) by way of the #include directive, traditionally used at the beginning of a ".cpp" file.

Source Code

C++ programs would be compilable even if using a single file, but any complex project will benefit from being split into several source files in order to be manageable and permit re-usability of the code. The beginning programmer sees this as an extra complication, where the benefits are obscure, especially since most of the first attempts will probably result in problems. This section will cover not only the benefits and best practices but also explain how a standardized method will avoid and reduce complexity.

Why split code into several files?
  • Increases organization and better code structure.
  • Improves compilation speed.
  • Promotes code reuse.
TODO

TODO
Complete

Source File Types

Some authors will refer to files with a .cpp extension as "source files" and files with the .h extension as "header files". However, both of those qualify as source code. As a convention for this book, all code, whether contained within a .cpp extension (where a programmer would put it), or within a .h extension (for headers), will be called source code. Any time we're talking about a .cpp file, we'll call it an "implementation file", and any time we're referring to a header file, we'll call it a "declaration file". You should check the editor/IDE or alter the configuration to a setup that best suits you and others that will read and use this files.

Declaration vs Definition

In general terms a declaration specifies for the linker, the identifier, type and other aspects of language elements such as variables and functions. It is used to announce the existence of the element to the compiler which require variables to be declared before use.

The definition assigns values to an area of memory that was reserved during the declaration phase. For functions, definitions supply the function body. While a variable or function may be declared many times, it is typically defined once.

An object may be declared many times but may only be defined one time.

This concept will be further explained and with some particulars noted (such as inline) as we introduce other components of C++. Here are some examples, ignore them for now if some of the concepts are beyond you and came back later to see the distinctions.

    int an_integer;                                 // defines an_integer
    extern const int a = 1;                         // defines a
    int function( int b ) { return b+an_integer; }  // defines function and defines b
    struct a_struct { int a; int b; };              // defines a_struct, a_struct::a, and a_struct::b
    struct another_struct {                         // defines another_strct
      int a;                                        // defines nonstatic data member x
      static int b;                                 // declares static data member y
      another_struct(): a(0) { } };                 // defines a constructor of another_struct
    int another_struct::b = 1;                      // defines another_struct::b
    enum { right, left };                           // defines right and left 
    namespace FirstNamespace { int a; }             // defines FirstNamespace  and FirstNamespace::a
    namespace NextNamespace = FirstNamespace ;      // defines NextNamespace 
    another_struct MySruct;                         // defines MySruct
    extern int b;                                   // declares b
    extern const int c;                             // declares c
    int another_function( int );                    // declares another_function
    struct aStruct;                                 // declares aStruct
    typedef int MyInt;                              // declares MyInt
    extern another_struct yet_another_struct;       // declares yet_another_struct
    using NextNamespace::a;                         // declares NextNamespace::a
.cpp

An implementation file includes the specific details, that is the definitions, for what is done by the program. While the header file for the light declared what a light could do, the light's .cpp file defines how the light acts.

We will go into much more detail on class definition later; here is a preview:

.cpp files
#include "light.h"
 
Light::Light () : on(false) {
}
 
void Light::toggle() {
  on = (!on);
}
 
bool Light::isOn() const {
  return on;
}
.h

Header files contain mostly declarations, to be used in the rest of the program. The skeleton of a class is usually provided in a header file, while an accompanying implementation file provides the definitions to put the meat on the bones of it. Header files are not compiled, but rather provided to other parts of the program through the use of #include.

.cpp files

A typical header file looks like the following:

// Inside sample.h
#ifndef SAMPLE_H
#define SAMPLE_H
 
// Contents of the header file are placed here.
 
#endif /* SAMPLE_H */

Since header files are included in other files, problems can occur if they are included more than once. This often results in the use of "header guards" using preprocessors (#ifndef, #define, and #endif). #ifndef checks to see if SAMPLE_H has appeared already, if it has not, the header becomes included and SAMPLE_H is defined. If SAMPLE_H was originally defined, then the file has already been included, and is not included again.

.cpp files

Classes are usually declared inside header files. We will go into much more detail on class declaration later; here is a preview:

// Inside light.h
#ifndef LIGHT_H
#define LIGHT_H
 
// A light which may be on or off.
class Light {
  private:
    bool on;
 
  public:
    Light ();       // Makes a new light.
    void toggle (); // If light is on, turn it off, if off, turn it on
    bool isOn();    // Is the light on?
};
 
#endif /* LIGHT_H - comment indicating which if this goes with */

This header file "light.h" declares that there is going to be a light class, and gives the properties of the light, and the methods provided by it. Other programmers can now include this file by typing #include "light.h" in their implementation files, which allows them to use this new class. Note how these programmers do not include the actual .cpp file that goes with this class that contains the details of how the light actually works. We'll return to this case study after we discuss implementation files.

Object Files

An object file is a temporary file used by the compiler as an intermediate step between the source code and the final executable file.

All other source files that are not or resulted from source code, the support data needed for the build (creation) of the program. The extensions of this files may vary from system to system, since they depend on the IDE/Compiler and necessities of the program, they may include graphic files, or raw data formats.

Object Code

The compiler produces machine code equivalent (object code) of the source code, contain the binary language (machine language) instruction to be used by the computer to do as was instructed in the source code, that can then be linked into the final program. This step ensures that the code is valid and will sequence into an executable program. Most object files have the file extension (.o) with the same restrictions explained above for the (.cpp/.h) files.

Libraries

Libraries are commonly distributed in binary form, using the (.lib) extension and header (.h) that provided the interface for its utilization. Libraries can also be dynamically linked and in that case the extension may depend on the target OS, for instance windows libraries as a rule have the (.dll) extension, this will be covered later on in the book in the libraries section of this book.

Makefiles

It is common for source code to come with a a specific script file named "Makefile" (without a standard extension or a standard interpreter). This type of script files is not covered by the C++ Standard, even though it is in common use.

In some projects, especially if dealing with a high level of external dependencies or specific configurations, like supporting special hardware, there is need to automate a vast number of incompatible compile sequences. This scripts are intended to alleviate the task. Explaining in detail the myriad of variations and of possible choices a programmer may make in using (or not) such a system goes beyond the scope of this book. You should check the documentation of the IDE, make tool or the information available on the source you are attempting to compile.

TODO

TODO
If someone wants to tackle this problem please change the text and point it to the relevant section, it was on the TODO list, do attempt to cover at least two distinct ones or the most used...


  • The Apache Ant wikibook describes how to write and use a "build.xml", one way to automate the build process.
  • The "make" wikibook describes how to write and use a "Makefile", another way to automate the build process.
  • ... many IDEs have a "build" button ...

Statements

Most programming languages share the concept of a statement. A statement is a command the programmer gives to the computer. It is also referred to as an expression.

Example

cout << "Hi there!"; // a single statement

Each valid C++ statement is terminated by a semicolon (;). Each statement performs an action. That statement and command will be examined in detail later on, for now consider that it has a verb ("cout") and the other details as information (what to print). In this case, the command "cout" means "send to the standard output stream," (in this case we assume the default, the console).

The programmer either enters the statement directly to the computer (by typing it while running a special program), or creates a text file with the command in it (you can use any text editor for that). You could create a file called "hi.txt", put the above command in it, and give the file to the computer.

If one were to write multiple statements, it is recommended that each statement be entered on a separate line and should end with a semicolon (;).

cout << "Hi there!";                   // a statement
cout << "Strange things are afoot..."; // another statement

However, there is no problem writing the code this way:

cout << "Hi there!"; cout << "Strange things are afoot...";

The former code gathers appeal in the developer circles. Writing statements as in the second example only makes your code look more complex and incomprehensible. We will speak of this deeply in the Coding Style Conventions Section of the book.

If you have more than one command in the file, each will be performed in order, top to bottom.

The computer will perform each of these commands sequentially. It's invaluable to be able to "play computer" when programming. Ask yourself, "If I were the computer, what would I do with these statements?" If you're not sure what the answer is, then you are very likely to write incorrect code. Stop and check the manual for the programming language you're using.

In the above case, the computer will look at the first statement, determine that it's a cout statement, look at what needs to be printed, and display that text on the computer screen. It'll look like this:

Hi there!

Note that the quotation marks aren't there. Their purpose in the program is to tell the computer where the text begins and ends, just like in English prose. The computer will then continue to the next statement, perform its command, and the screen will look like this:

Hi there! Strange things are afoot...

When the computer gets to the end of the text file, it stops. There are many different kinds of statements, depending on which programming language is being used. For example, there could be a beep statement that causes the computer to output a beep on its speaker, or a window statement that causes a new window to pop up.

Also, the way statements are written will vary depending on the programming language. These differences are fairly superficial. The set of rules like the first two is called a programming language's syntax. The set of verbs is called its library.

cout << "Hi there!";
Statement Blocks

Also referred to Code Blocks (or in C++-speak, a compound statement), consist on one or more statements or commands that are contained between a pair of curly braces { }. Such a block of statement can be named or be provided a condition for execution. Below is how you'd place a series of statements in a block.

{
  int a = 10;
  int b = 20;
  int result = a + b;
}

Blocks are used primarily in loops, conditionals and functions. Blocks can be nested inside one another, for instance as an if structure inside of a loop inside of a function.

Program Control Flow

As seen above the statements are evaluated in the order as they occur (sequentially). The execution of flow begins at the top most statement and proceed downwards till the last statement is encountered. A statement can be substituted by a statement block. There are special statements that can redirect the execution flow based on a condition, those statements are called branching statements, described in detail in the Control Flow Construct Statements Section of the book.

Coding Style Conventions

The use of a guide or set of convention gives programmers a set of rules for code normalization or coding style that establishes how they should format code, name variables, place comments or any other non language dependent structural decision that is used on the code. This is very important, as you share a project with others. Agreeing to a common set of coding standards and recommendations saves time and effort, by enabling a greater understandings and transparency of the code base, providing a common ground for undocumented structures, making for easy debugging, and increasing code maintainability. These rules can also be referred to as Source Code Style, Code Conventions, Coding Standards or a variation of those.

A list of different approaches can be found on the C++ coding conventions Reference Section. The most commonly used style in C++ programming is ANSI or Allman while much C programming is still done in the Kernighan and Ritchie (K&R) style. You should be warned that this should be one of the first decisions you make on a project and in a democratic environment, a consensus can be very hard to achieve.

Programmers tend to stick to a coding style, they have it automated and any deviation can be very hard to conform with, if you don't have a favorite style try to use the smallest possible variation to a common one or get as broad a view as you can get, so that you can adapt easily to changes or defend your approach. There is software that can help to format or beautify the code, but automation can have its drawbacks. As seen earlier, indentation and the use of white spaces or tabs are completely ignored by the compiler. A coding style should vary depending on the lowest common denominator of the needs to standardize.

Field impacted by the selection of a Code Style are:

  • Reusability
    • Self documenting code
    • Internationalization
    • Maintainability
    • Portability
  • Optimization
  • Build process
  • Error avoidance
  • Security
Standardization is Important

No matter which particular coding style you pick, once it is selected, it should be kept throughout the same project. Reading code that follows different styles can become very difficult. In the next sections we try to explain why some of the options are common practice without forcing you to adopt a specific style.

NOTE:
Using a bad Coding Style is worse than having no Coding Style at all, since you will be extending bad practices to all the code base.

25 lines 80 columns

This is a commonly recommended but often inapplicable rule. Many people say it's an outdated rule, that it comes from prehistoric times when terminals could only display 25 lines 80 columns.

This rules signifies that if you are writing code that will go further than 80 columns or 25 lines, it's time to think about splitting the code into functions, as it helps allow you to review code without having to scroll the display. This practice will save you precious time when you have to return to a project you haven't been working on for 6 months.

For example, you may want to split long output statements across multiple lines:

    fprintf(stdout,"The quick brown fox jumps over the lazy dog. "
                   "The quick brown fox jumps over the lazy dog.\n"
                   "The quick brown fox jumps over the lazy dog - %d", 2);


   This recommended practice relates also to the 0 means success convention for functions, that we will cover on the Functions Section of this book.

Whitespace and Indentation

Definition:
Spaces, tabs and newlines (line breaks) are called whitespace. Whitespace is required to separate adjacent words and numbers; they are ignored everywhere else except within quotes and preprocessor directives

Conventions followed when using whitespace to improve the readability of code is called an indentation style. Every block of code and every definition should follow a consistent indention style. This usually means everything within { and }. However, the same thing goes for one-line code blocks.

Use a fixed number of spaces for indentation. Recommendations vary; 2, 3, 4, 8 are all common numbers. If you use tabs for indention you have to be aware that editors and printers may deal with, and expand, tabs differently. The K&R standard recommends an indentation size of 4 spaces. [1]

For example, a program could as well be written using as follows:

// Using an indentation size of 2
if ( a > 5 )  { b=a; a++; }

However, the same code could be made much more readable with proper indentation:

// Using an indentation size of 2
if ( a > 5 )  {
  b = a;
  a++;
}
 
// Using an indentation size of 4
if ( a > 5 )
{
    b = a;
    a++;
}

Placement of braces (curly brackets)

As we have seen early on the Statements Section, compound statement are very important in C++, they also are subject of different coding styles, that recommend different placements of opening and closing braces ({ and }). Some recommend putting the opening brace on the line with the statement, at the end (K&R). Others recommend putting these on a line by itself, but not indented (ANSI C++). GNU recommends putting braces on a line by itself, and indenting them half-way. We recommend picking one brace-placement style and sticking with it.

Examples:

if (a > 5) {
  // This is K&R style
}
 
if (a > 5) 
{
  // This is ANSI C++ style
}
 
if (a > 5) 
  {
    // This is GNU style
  }

Comments

Comments are portions of the code ignored by the compiler which allow the user to make simple notes in the relevant areas of the source code. Comments come either in block form or as single lines.

  • Single-line comments (informally, C++ style), start with // and continue until the end of the line. If the last character in a comment line is a \ the comment will continue in the next line.
  • Multi-line comments (informally, C style), start with /* and end with */.

NOTE:
Since the 1999 revision, C also allows C++ style comments, so the informal names are largely of historical interest that serves to make a distinction of the two methods of commenting.

We will now describe how a comment can be added to the source code, but not where, how, and when to comment; we will get into that later.

C style Comments

If you use this kind of comment try to use it like this... Commented

/*void EventLoop(); /**/

or for multiple lines

/*
void EventLoop();
void EventLoop();
/**/

this opens you the option to do this... Uncommented

void EventLoop(); /**/

or for multiple lines

void EventLoop();
void EventLoop();
/**/

NOTE:
Some compilers may generate errors/warnings.
Try to avoid using C style inside a function because of the non nesting facility of C style (most editors now have some sort of coloring ability that prevents this kind of error, but it was very common to miss it, and you shouldn't make assumptions on how the code is read).

... by removing only the start of comment and so activating the next one, you did re-activate the commented code, because if you start a comment this way it will be valid until it finds the close of comment */.

NOTE:
Remember that C-style comments /* like this */ do not "nest", i.e., you can't write

int function() /* This is a comment /*
{              
 return 0;  
}              and this is the same comment */
               so this isn't in the comment, and will give an error*/

because of the text so this isn't in the comment */ at the end of the line which is not inside the comment; the comment ends at the first */ pair it finds, ignoring any interim /* pairs which might look to human readers like the start of a nested comment.

C++ style Comments

Examples:

// This is a single one line comment

or

if (expression) // This needs a comment
{
  statements;   
}
else
{
  statements;
}

The backslash is a continuation character and will continue the comment to the following line:

// This comment will also comment the following line \
std::cout << "This line will not print" << std::endl;

Using comments to temporarily ignore code

Comments are also sometimes used to enclose code that we temporarily want the compiler to ignore. This can be useful in finding errors in the program. If a program does not give the desired result, it might be possible to track which particular statement contains the error by commenting out code.

Example with C style comments
/* This is a single line comment */

or

/*
   This is a multiple line comment
*/
C and C++ style

Combining multi-line comments (/* */) with c++ comments (//) to comment out multiple lines of code:

Commenting out the code:

/*
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
//*/

uncommenting the code chunk

//*
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
//*/

This works because a //* is still a c++ comment. And //*/ acts as a c++ comment and a multi-line comment terminator. However this doesn't work if there are any multi-line comments are used for function descriptions.

Note on doing it with preprocessor statements

Another way (considered bad practice) is to selectively enable disable sections of code:

#if(0)   // Change this to 1 to uncomments.
void EventLoop();
#endif

this is considered a bad practice because the code often becomes illegible when several #if's are mixed, if you use them don't forget to add a comment at the #endif saying what #if it correspond

#if (FEATURE_1 == 1)
do_something;
#endif //FEATURE_1 == 1

you can prevent illegibility by using inline functions (often considered better than macros for legibility with no performance cost) containing only 2 sections in #if #else #endif

inline do_test()
  {
    #if (Feature_1 == 1)
      do_something
    #endif  //FEATURE_1 == 1
  }

and call

do_test();

in the program

NOTE:
The use of one-line C-style comments should be avoided as they are considered outdated. Mixing C and C++ style single-line comments is considered poor practice. One exception, that is commonly used, is to disable a specific part of code in the middle of a single line statement for test/debug purposes, in release code any need for such action should be removed.

Naming identifiers

Identifiers are names given to variables, functions, objects, etc. to refer to them in the program. C++ identifiers must start with a letter or an underscore character "_", possibly followed by a series of letters, underscores or digits. None of the C++ keywords can be used as identifiers. Identifiers with successive underscores are reserved for use in the header files or by the compiler for special purpose, e.g. name mangling.

This leaves a lot of freedom in naming, one could use specific prefixes or suffixes, start names with an initial upper or lower case letter, keep all the letters in a single case or, with compound words, use a word separator character like "_" or flip the case of the first letter of each component word.

Hungarian Notation

Hungarian notation, which would now be called Apps Hungarian, was invented by Charles Simonyi, a programmer who worked at Xerox PARC circa 1972-1981, and who later became Chief Architect at Microsoft and has been until recently the preeminent naming convention used on most Microsoft code. It uses prefixes, like "m_" to indicates it is a member variable, the "p" indicates that a pointer and the rest of the name is normally written out with caps on the first letter. We mention this convention because you will very probably find it in use, even more probable if you do any programming in Windows, if you are interested on learning more you can check Wikipedia's entry on this notation.

This notation is considered outdated, since it is highly prone to errors and requires some effort to maintain without no real gains in today's IDEs. Today refactoring is an everyday task, the IDEs have evolved to provide help with identifier popups and the use of color schemes. All this informational aids removes the need to use this notation.

Leading underscores

In most contexts, leading underscores are better avoided. They are reserved for the compiler or internal variables of a library, and can make your code less portable and more difficult to maintain. Those variables can also be stripped from a library (i.e. the variable isn't accessible anymore, it is hidden from external world) so unless you want to override an internal variable of a library, don't do it.

Reusing existing names

Do not use the names of standard library functions and objects for your identifiers as these names are considered reserved words and programs may become difficult to understand when used in unexpected ways.

Sensible names

Always use good, unabbreviated, correctly-spelled meaningful names.

Prefer the English language (since C++ and most libraries already use English) and avoid short cryptic names. This will make it easier to read and to type a name without having to look it up.

NOTE:
It is acceptable to ignore this rule for loop variables and variables used within a small scope (~20 lines), they may be given short names to save space if the purpose of that variable is obvious enough. Historically the most commonly used variable name in this cases is "i".

The "i" is probably derived from the word "increment", it is very commonly found in for loops that does fit nicely the specification for the use of such variable names.

Names indicate purpose

An identifier should indicate the function of the variable/function/etc. that it represents, e.g. foobar is probably not a good name for a variable storing the age of a person.

Identifier names should also be descriptive. n might not be a good name for a global variable representing the number of employees. However, a good medium between long names and lots of typing has to be found. Therefore, this rule can be relaxed for variables that are used in a small scope or context. Many programmers prefer short variables (such as i) as loop iterators.

Capitalization

Conventionally, variable names start with a lower case character. In identifiers which contain more than one natural language words, either underscores or capitalization is used to delimit the words, e.g. num_chars (K&R style) or numChars (Java style). It is recommended that you pick one notation and do not mix them within one project.

Constants

When naming #defines, constant variables, enum constants. and macros put in all uppercase using '_' separators; this makes it very clear that the value is not alterable and in the case of macros, makes it clear that you are using a construct that requires care.

NOTE:
There is a large school of thought that names LIKE_THIS should be used only for macros, so that the name space used for macros (which do not respect C++ scopes) does not overlap with the name space used for other identifiers. As is usual in C++ naming conventions, there is not a single universally agreed standard. The most important thing is usually to be consistent.

Functions and Member Functions

The name given to functions and member functions should be descriptive and make it clear what it does. Since usually functions and member functions perform actions, the best name choices typically contain a mix of verbs and nouns in them such as CheckForErrors() instead of ErrorCheck() and dump_data_to_file() instead of data_file(). Clear and descriptive names for functions and member functions can sometimes make guessing correctly what functions and member functions do easier, aiding in making code more self documenting. By following this and other naming conventions programs can be read more naturally.

People seem to have very different intuitions when using names containing abbreviations. It's best to settle on one strategy so the names are absolutely predictable. Take for example NetworkABCKey. Notice how the C from ABC and K from key are confused. Some people don't mind this and others just hate it so you'll find different policies in different code so you never know what to call something.

Prefixes and suffixes are sometimes useful:

  • Min - to mean the minimum value something can have.
  • Max - to mean the maximum value something can have.
  • Cnt - the current count of something.
  • Count - the current count of something.
  • Num - the current number of something.
  • Key - key value.
  • Hash - hash value.
  • Size - the current size of something.
  • Len - the current length of something.
  • Pos - the current position of something.
  • Limit - the current limit of something.
  • Is - asking if something is true.
  • Not - asking if something is not true.
  • Has - asking if something has a specific value, attribute or property.
  • Can - asking if something can be done.
  • Get - get a value.
  • Set - set a value.
Examples

In most contexts, leading underscores are also better avoided. For example, these are valid identifiers:

  • i loop value
  • numberOfCharacters number of characters
  • number_of_chars number of characters
  • num_chars number of characters
  • get_number_of_characters() get the number of characters
  • get_number_of_chars() get the number of characters
  • is_character_limit() is this the character limit?
  • is_char_limit() is this the character limit?
  • character_max() maximum number of a character
  • charMax() maximum number of a character
  • CharMin() minimum number of a character

These are also valid identifiers but can you tell what they mean?:

  • num1
  • do_this()
  • g()
  • hxq

The following are valid identifiers but better avoided:

  • _num as it could be used by the compiler/system headers
  • num__chars as it could be used by the compiler/system headers
  • main as there is potential for confusion
  • cout as there is potential for confusion

The following are not valid identifiers:

  • if as it is a keyword
  • 4nums as it starts with a digit
  • number of characters as spaces are not allowed within an identifier

Reduced/Abuse the use of keywords

This can be defended both ways. This can mean less typing but also make the reader and the compiler (depending on the situation) to do extra work, on the other hand if you write more keywords the resulting code will be clearer and reduces errors, or more defined (self documented) but this can lead to adding limitations to the code's evolution. This is a thin line were an equilibrium must be reached in accord to the projects nature. The important fact is to be consistent as with any other rule.

inline

Use inline if the member function is implicitly inlined.

const

Unless you plan on modifying it, you're arguably better off using const data types. The compiler can easily optimize more with this restriction, and you're unlikely to accidentally corrupt the data. Ensure that your methods take const data types unless you absolutely have to modify the parameters. Similarly, when implementing accessors for private member data, you should in most cases return a const. This will ensure that if the object that you're operating on is passed as const, methods that do not affect the data stored in the object still work as they should and can be called. For example, for an object containing a person, a getName() should return a const data type where as walk() might be non-const as it might change some internal data in the Person such as tiredness.

typedef

It is common practice to avoid using this keyword since it can obfuscate code if not properly used or it can cause programmers to accidentally misuse large structures thinking them to be simple types. If used, define a set of rules for the types you rename and be sure to document them.

volatile

This keyword informs the compiler that the variable it is qualifying as volatile (can change at anytime) is excluded from any optimization techniques. Usage of this variable should be reserved for variables that are known to be modified due to an external influence of a program (whether it's hardware update, third party application, or another thread in the application).

Since the volatile keyword impacts performance, you should consider a different design that avoids this situation: most platforms where this keyword is necessary provide an alternative that helps maintain scalable performance.

Note that using volatile was not intended to be used as a threading or synchronization primitive, nor are operations on a volatile variable guaranteed to be atomic.

Pointer declaration

Due to historical reasons some programmers refer to a specific use as:

// C codestyle
int *z;
 
// C++ codestyle
int* z;

Since the second variation is by far preferred by C++ programmers and will help identify a C programmer or legacy code.

One argument against the C++ codestyle version is when chaining declarations of more than one item, like:

// C codestyle
int *ptrA, *ptrB;
 
// C++ codestyle
int* ptrC, ptrD;

As you can see, in n this case, the C codestyle makes it more obvious that ptrA and ptrB are pointers to int, and the C++ codestyle makes it less obvious that ptrD is an int, not a pointer to int.

It is rare to use chains of multiple objects in C++ code with the exception of the basic types and even so it is a not often used and it is extremely rare to see it used in pointers or other complex types, since it will make it harder to for a human to visually parse the code.

In any case most programmers would use a naming convention for pointer variables that would address this downside, in the example above we would see something like:

// C++ codestyle
int* ptrC, D;

Making it easily detectable.

Document your code

There are a number of good reasons to document your code, and a number of aspects of it that can be documented. Documentation provides you with a shortcut for obtaining an overview of the system or for understanding the code that provides a particular feature.

Why?

The purpose of comments is to explain and clarify the source code to anyone examining it (or just as a reminder to yourself). Good commenting conventions are essential to any non-trivial program so that a person reading the code can understand what it is expected to do and to make it easy to follow on the rest of the code. In the next topics some of the most How? and When? rules to use comments will be listed for you.

Documentation of programming is essential when programming not just just in C++, but in any programming language. Many companies have moved away from the idea of "hero programmers" (i.e., one programmer who codes for the entire company) to a concept of groups of programmers working in a team. Many times programmers will only be working on small parts of a larger project. In this particular case, documentation is essential because:

  • Other programmers may be tasked to develop your project;
  • Your finished project may be submitted to editors to assemble your code into other projects;
  • A person other than you may be required to read, understand, and present your code.

Even if you are not programming for a living or for a company, documentation of your code is still essential. Though many programs can be completed in a few hours, more complex programs can take longer time to complete (days, weeks, etc.). In this case, documentation is essential because:

  • You may not be able to work on your project in one session;
  • It provides a reference to what was changed the last time you programmed;
  • It allows you to record why you made the decisions you did, including why you chose not to explore certain solutions;
  • It can provide a place to document known limitations and bugs (for the latter a defect tracking system may be the appropriate place for documentation);
  • It allows easy searching and referencing within the program (from a non-technical stance);
  • It is considered to be good programming practice.
Comments Should Be Written For the Appropriate Audience

When writing code to be read by those who are in the initial stages of learning a new programming language, it can be helpful to include a lot of comments about what the code does. For "production" code, written to be read by professionals, it is considered unhelpful and counterproductive to include comments which say things that are already clear in the code. Some from the Extreme Programming community say that excessive commenting is indicative of code smell -- which is not to say that comments are bad, but that they are often a clue that code would benefit from refactoring. Adding comments as an alternative to writing understandable code is considered poor practice.

What?

What needs to be documented in a program/source code can be divided into what is documented before the specific program execution (that is before "main") and what is executed ("what is in main").

Documentation before program execution:

  • Programmer information and license information (if applicable)
  • User defined function declarations
  • Interfaces
  • Context
  • Relevant standards/specifications
  • Algorithm steps
  • How to convert the source code into executable file(s) (perhaps by using make)

Documentation for code inside main:

  • Statements, Loops, and Cases
  • Public and Private Sectors within Classes
  • Algorithms used
  • Unusual features of the implementation
  • Reasons why other choices have been avoided
  • User defined function implementation

If used carelessly comments can make source code hard to read and maintain and may be even unnecessary if the code is self-explanatory -- but remember that what seems self-explanatory today may not seem the same six months or six years from now.

Document Decisions

Comments should document decisions. At every point where you had a choice of what to do place a comment describing which choice you made and why. Archaeologists will find this the most useful information.

Comment Layout

Each part of the project should at least have a single comment layout, and it would be better yet to have the complete project share the same layout if possible.

TODO

TODO
Add more here.

How?

Documentation can be done within the source code itself through the use of comments (as seen above) in a language understandable to the intended audience. It is good practice to do it in English as the C++ language is itself English based and English being the current lingua franca of international business, science, technology and aviation, you will ensure support for the broadest audience possible.

Comments are useful in documenting portions of an algorithm to be executed, explaining function calls and variable names, or providing reasons as to why a specific choice or method was used. Block comments are used as follows:

/*
get timepunch algorithm - this algorithm gets a time punch for use later
1. user enters their number and selects "in" or "out"
2. time is retrieved from the computer
3. time punch is assigned to user
*/

Alternately, line comments can be used as follows:

GetPunch(user_id, time, punch); //this function gets the time punch

An example of a full program using comments as documentation is:

/*
Chris Seedyk
BORD Technologies
29 December 2006
Test
*/
int main()
{
 cout << "Hello world!" << endl; //predefined cout prints stuff in " " to screen
 return 0;
}

It should be noted that while comments are useful for in-program documentation, it is also a good idea to have an external form of documentation separate from the source code as well, but remember to think first on how the source will be distributed before making references to external information on the code comments.

Commenting code is also no substitute for well-planned and meaningful variable, function, and class names. This is often called "self-documenting code," as it is easy to see from a carefully chosen and descriptive name what the variable, function, or class is meant to do. To illustrate this point, note the relatively equal simplicity with which the following two ways of documenting code, despite the use of comments in the first and their absence in the second, are understood. The first style is often encountered in very old C source by people who understood well what they were doing and had no doubt anyone else might not comprehend it. The second style is more "human-friendly" and while much easier to read is nevertheless not as frequently encountered.

// Returns the area of a triangle cast as an int
int area_ftoi(float a, float b) { return (int) a * b / 2; }
 
int iTriangleArea(float fBase, float fHeight)
{
   return (int) fBase * fHeight / 2;
}

Both functions perform the same task, however the second has such practical names chosen for the function and the variables that its purpose is clear even without comments. As the complexity of the code increases, well-chosen naming schemes increase vastly in importance.

Regardless of what method is preferred, comments in code are helpful, save time (and headaches), and ensure that both the author and others understand the layout and purpose of the program fully.

Automatic Documentation

Various tools are available to help with documenting C++ code; Literate Programming is a whole school of thought on how to approach this, but a very effective tool is Doxygen (also supports several languages), it can even use hand written comments in order to generate more than the bare structure of the code, bringing Javadoc-like documentation comments to C++ and can generate documentation in HTML, PDF and other formats.

Comments Should Tell a Story

Consider your comments a story describing the system. Expect your comments to be extracted by a robot and formed into a manual page. Class comments are one part of the story, method signature comments are another part of the story, method arguments another part, and method implementation yet another part. All these parts should weave together and inform someone else at another point of time just exactly what you did and why.

Do not use comments for flowcharts or pseudocode

You should refrain from using comments to do ASCII art or pseudocode (some programmers attempt to explain their code with an ASCII-art flowchart). If you want to flowchart or otherwise model your design there are tools that will do a better job at it using standardized methods. See for example: UML.

Internal storage of data types

Bits and Bytes

The byte is the smallest individual piece of data that we can access or modify on a computer. The computer only works on bytes or groups of bytes, never on bits. If you want to modify individual bits, you have to use binary operations on the whole byte that tell the computer how to modify individual bits, but the operation is still done on whole bytes. Before getting too far ahead of ourselves, we'll look at the internal representation of a byte.

Here's a look at a byte as the computer stores it.

Byte45.png

There is actually quite a lot of information here. A byte (usually) contains 8 bits. A bit can only have a value of 0 or 1. The bit number is used to label each bit in the byte (so that we can tell which bit we are talking about). You may be wondering why the bits are labeled from 7 to 0 instead of 0 to 7 or even 1 to 8. The reason 0 is used is because computers always start counting at 0. Technically, we COULD start counting at 1, but this would go against the counting nature of the computer. It is simply more convenient to use 0 for computers as we shall see. Now as to why we numbered them in descending order. In decimal numbers (normal base 10), we put the more significant digits to the left. Example: 254. The 2 here is more significant than the other digits because it represents hundreds as opposed to tens for the 5 or singles for the 4. The same is done in binary. The more significant digits are put towards the left. Counting in binary and in decimal is done in exactly the same manner, except that in binary, instead of counting from 0 to 9, we only count from 0 to 1. If we want to count higher than 1, then we need a more significant digit to the left. In decimal, when we count beyond 9, we need to add a 1 to the next significant digit. It sometimes may look confusing or different only because we as humans are used to counting with 10 digits. In binary, there are only 2 digits, but counting is done by the exact same principles as counting in decimal.

NOTE:
The most significant digit in a byte is bit#7 and the least significant digit is bit#0. These are otherwise known as "msb" and "lsb" respectively in lowercase. If written in uppercase, MSB will mean most significant BYTE. You will see these terms often in programming or hardware manuals. Also, lsb is always bit#0, but msb can vary depending on how many bytes we use to represent numbers. However, we won't look into that right now.

In decimal, each digit represents multiple of a power of 10. Let's take another look at the decimal number 254.

  • The 4 represents four multiples of one (4 \times 10^0 since 100 = 1).
  • Since we're working in decimal (base 10), the 5 represents five multiples of 10 (5 \times 10^1)
  • Finally the 2 represents two multiples of 100 (2 \times 10^2)

All this is elementary. The key point to recognize is that as we move from right to left in the number, the significance of the digits increases by a multiple of 10. This should be obvious when we look at the following equation:

(2 \times 10^2) + (5 \times 10^1) + (4 \times 10^0) = 254

Do you see any similarities between this and the diagram above? In binary, each digit can only be one of two possibilities (0 or 1), therefore when we work with binary we work in base 2 instead of base 10. So, to convert the binary number 1101 to decimal we can use the following base 10 equation, which you should find very much like the one above:

(1 \times 2^3) + (1 \times 2^2) + (0 \times 2^1) + (1 \times 2^0) = 8 + 4 + 0 + 1 = 13

So, to convert the number we simply add the bit values (2n) where a 1 shows up. Let's take a look at our example byte again, and try to find its value in decimal.

Byte45.png

First off, we see that bit #5 is a 1, so we have 25 = 32 in our total. Next we have bit#3, so we add 23 = 8. This gives us 40. Then next is bit#2, so 40 + 4 is 44. And finally is bit#0 to give 44 + 1 = 45. So this binary number is 45 in decimal.

As can be seen, it is impossible for different bit combinations to give the same decimal value. Here is a quick example to show the relationship between counting in binary (base 2) and counting in decimal (base 10). The bases that these numbers are in are shown in subscript to the right of the number.

002 = 010

012 = 110

102 = 210

112 = 310

Data and Variables

When programming in C++, we need a way to store data that can be manipulated by our program. Data comes in a variety of formats, so the compiler needs a way to differentiate between the different types. Right now, we'll concentrate on using bytes. The type name for a byte in C++ is 'char'. It's called char because a byte is often used to represent characters. We won't go into that right now. We only want to use it for its numerical representation.

Let's write a program that will print each value that a byte can hold. How do we do that? We could write a loop that goes from 0 to 255. We'll set our byte to 0 and add one to it every time through the loop. As a side note, do you know what would happen if you added 1 to 255? No combination will represent 256 unless we add more bits. If you look at the diagram above, you will see that the next value (if we could have another digit) would be 256. So our byte would look like this.

Byte256.png

But this 9th bit (bit#8) doesn't exist. So where does it go? It actually goes into the carry bit. The carry bit, you say? The processor of the computer has an internal bit used exclusively for carry operations such as this. So if you add 1 to 255 stored in a byte, you'd get 0 with the carry bit set in the CPU. Of course, being a C++ programmer, you never get to use this bit directly. You'll need to learn assembler if you want to do that, but that's a whole other ball game.

In our program, we can start off with a value of 0 and wait until it becomes 0 again before exiting. This will make sure we go through every value a byte can hold.

Inside your main() function, write the following. Don't worry about the loop just yet. We are more concerned with the output right now.

char b=0;
 
do
{
  cout << (int)b << " ";
  b++;
  if ((b&15)==0) cout << endl;
} while(b!=0);

b is our byte and we initialize it to 0. Inside the loop we print its value. We cast it to an int so that a number is printed instead of a character. Don't worry about casting or int's right now. The next line increments the value in our byte b. Then we print a new line (carriage return/endl) after every 16 numbers by using the code "b&15" which uses '&', the bitwise AND operator, and the value 15 (00001111 in binary), to give the value of the lowest 4 bits, which will be 0 every multiple of 16, because 16 is 00010000 in binary, so multiplying this by something will just change the higher bits, and will leave the lower bits alone. We do this so that we can see all 256 values on the screen at once by printing 16 values per line, and so only using 16 lines, assuming 16 values will fit on one line.

If you were to run this program, you would notice something strange. After 127, we got -128! Negative numbers! Where did these come from? Well, it just so happens that the compiler needs to be told if we're using numbers that can be negative or number that can only be positive. These are called signed and unsigned numbers respectively. By default, the compiler assumes we want to use signed numbers unless we explicitly tell it otherwise. To fix our little problem, add "unsigned" in front of the declaration for b so that it reads:

unsigned char b=0;

Problem solved!

Two's Complement

Two's complement is a way to store negative numbers in a pure binary representation. The reason that the two's complement method of storing negative numbers was chosen is because this allows the CPU to use the same add and subtract instructions on both signed and unsigned numbers.

To convert a positive number into it's negative two's complement format, you begin by flipping all the bits in the number (1's become 0's and 0's become 1's) and then add 1. (This also works to turn a negative number back into a positive number Ex: -34 into 34 or vice-versa).

Let's try to convert our number 45.

Byte45.png

First, we flip all the bits...

Byte45flip.png

And add 1.

Byte45flip1.png

Now if we add up the values for all the one bits, we get... 128+64+16+2+1=211? What happened here? Well, this number actually is 211. It all depends on how you interpret it. If you decide this number is unsigned, then it's value is 211. But if you decide it's signed, then it's value is -45. It is completely up to you how you treat the number.

If and only if you decide to treat it as a signed number, then look at the msb (most significant bit [bit#7]). If it's a 1, then it's a negative number. If it's a 0, then it's positive. In C++, using "unsigned" in front of a type will tell the compiler you want to use this variable as an unsigned number, otherwise it will be treated as signed number.

Now, if you see the msb is set, then you know it's negative. So convert it back to a positive number to find out it's real value using the process just described above.

Let's go through a few examples.

Treat the following number as an unsigned byte. What is it's value in decimal?

Byte228.png

Since this is an unsigned number, no special handling is needed. Just add up all the values where there's a 1 bit. 128+64+32+4=228. So this binary number is 228 in decimal.

Now treat the number above as a signed byte. What is its value in decimal?

Since this is now a signed number, we first have to check if the msb is set. Let's look. Yup, bit #7 is set. So we have to do a two's complement conversion to get its value as a positive number (then we'll add the negative sign afterwards).

Ok, so let's flip all the bits...

Byte228flip.png

And add 1. This is a little trickier since a carry propagates to the third bit. For bit#0, we do 1+1 = 10 in binary. So we have a 0 in bit#0. Now we have to add the carry to the second bit (bit#1). 1+1=10. bit#1 is 0 and again we carry a 1 over to the 3rd bit (bit#2). 0+1 = 1 and we're done the conversion.

Byte228flip1.png

Now we add the values where there's a one bit. 16+8+4 = 28. Since we did a conversion, we add the negative sign to give a value of -28. So if we treat 11100100 (base 2) as a signed number, it has a value of -28. If we treat it as an unsigned number, it has a value of 228.

Let's try one last example.

Give the decimal value of the following binary number both as a signed and unsigned number.

Byte5.png

First as an unsigned number. So we add the values where there's a 1 bit set. 4+1 = 5. For an unsigned number, it has a value of 5.

Now for a signed number. We check if the msb is set. Nope, bit #7 is 0. So for a signed number, it also has a value of 5.

As you can see, if a signed number doesn't have its msb set, then you treat it exactly like an unsigned number.

NOTE:
A special case of two's complement is where the sign bit (msb or bit#7 in a byte) is set to one and all other bits are zero, then its two's complement will be itself. It is a fact that two's complement notation (signed numbers) have 1 extra number than can be negative than positive. So for bytes, you have a range of -128 to +127. The reason for this is that the number zero uses a bit pattern (all zeros). Out of all the 256 possibilities, this leaves 255 to be split between positive and negative numbers. As you can see, this is an odd number and cannot be divided equally. If you were to try and split them, you would be left with the bit pattern described above where the sign bit is set (to 1) and all other bits are zeros. Since the sign bit is set, it has to be a negative number.

If you see this bit pattern of a sign bit set with everything else a zero, you cannot convert it to a positive number using two's complement conversion. The way you find out its value is to figure out the maximum number of bit patterns the value or type can hold. For a byte, this is 256 possibilities. Divide that number by 2 and put a negative sign in front. So -128 is this number for a byte. The following will be discussed below, but if you had 16 bits to work with, you have 65536 possibilities. Divide by 2 and add the negative sign gives a value of -32768.

Endian

Now that we've seen many ways to use a byte, it is time to look at ways to represent numbers larger than 255. By grouping bytes together, we can represent numbers that are much larger than 255. If we use 2 bytes together, we double the number of bits in our number. In effect, 16 bits allows us to represent numbers up to 65535 (unsigned). And 32 bits allows us to represent numbers above 4 billion. We already saw the type for a byte. It is called a 'char'.

Here are a few basic primitive types:

1. char (1 byte (by definition), max unsigned value: at least 255)

2. short int (at least 16 bits, max unsigned value: at least 65535)

3. long int (at least 32 bits, max unsigned value: at least 4294967295)

4. float (typically 4 bytes, floating point)

5. double (typically 8 bytes, floating point)

For 'short int' and 'long int', you can leave out the 'int' because the compiler will know what type you want. You can also use 'int' by itself and it will default to whatever your compiler is set at for an int. On most recent compilers, int defaults to a 32-bit type.

3 basic primitive types char,short int,long int.

All of the topics explained above also apply to short int's and long int's. The difference is simply the number of bits used is different and the msb is now bit#15 for a short and bit#31 for a long (assuming a 32-bit long type).

Let's look at a (16-bit) short. You may think that in memory the byte for bits 15 to 8 would be followed by the byte for bits 7 to 0 (because bits 15 to 8 appears first). In other words, byte #0 would be the high byte and byte #1 would be the low byte. This is true for some other systems. For example, the Motorola 68000 series CPUs do work this way. The Amiga and old Macintoshes use the M68000 and they indeed do use this byte ordering.

However, on PCs (with 8088/286/386/486/Pentiums) this is not so. The ordering is reversed so that the low byte comes before the high byte. The byte that represents bits 0 to 7 always comes before all other bytes on PCs. This is called little-endian ordering. The other ordering, such as on the M68000, is called big-endian ordering. This is very important to remember when doing low level byte operations.

For big-endian computers, the basic idea is to keep the higher bits on the left or in front. For little-endian computers, the idea is to keep the low bits in the low byte. There is no inherent advantage to either scheme except perhaps for an oddity. Using a little-endian long int as a smaller type of int is theoretically possible as the low byte(s) is/are always in the same location (first byte). With big-endian the low byte is always located differently depending on the size of the type. For example (in big-endian), the low byte is the 4th byte in a long int and the 2nd byte in a short int. So a proper cast must be done and low level tricks become rather dangerous.

To convert from one endianness to the other, you reverse the values of the bytes, putting the highest bytes value in the lowest byte and the lowest bytes value in the highest byte, and swap all the values for the in between bytes, so that if you had a 4 byte little-endian integer 0x0A0B0C0D (the 0x signifies that the value is hexadecimal) then converting it to big-endian would change it to 0x0D0C0B0A.

Bit endianness, where the bit order inside the bytes changes, is rarely used in data storage and only really ever matters in serial communication links, where the hardware deals with it.

Floating point representation

A generic real number with a decimal part can also be expressed in binary format. For instance 110.01 in binary corresponds to:

 1 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 + 0 \times 2^{-1} + 1 \times 2^{-2} = {\color{Blue} 2^2 + 2^1 + 2^{-2}} = 6.25

Exponential notation (also known as scientific notation, or standard form, when used with base 10, as in 3 \times 10^8) can be also used and the same number expressed as:

 1.1001 \times 2^2 \qquad ( = 11.001 \times 2^1 = 110.01 )

When there is only one non-zero digit on the left of the decimal point, the notation is termed normalized.

In computing applications a real number is represented by a sign bit (S) an exponent (e) and a mantissa (M). The exponent field needs to represent both positive and negative exponents. To do this, a bias E is added to the actual exponent in order to get the stored exponent, and the sign bit (S), which indicates whether or not the number is negative, is transformed into either +1 or -1, giving s. A real number is thus represented as:

 f = s \times M \times 2^{e-E}

S, e and M are concatenated one after the other in a 32-bit word to create a single precision floating point number and in a 64-bit doubleword to create a double precision one. For the single float type, 8 bits are used for the exponent and 23 bits for the mantissa, and the exponent offset is E=127. For the double type 11 bits are used for the exponent and 52 for the mantissa, and the exponent offset is E=1023.

There are two types of floating point numbers. Normalized and denormalized. A normalized number will have an exponent e in the range 0<e<28 - 1 (between 00000000 and 11111111, non-inclusive) in a single precision float, and an exponent e in the range 0<e<211 - 1 (between 00000000000 and 11111111111, non-inclusive) for a double float. Normalized numbers are represented as sign times 1.Mantissa times 2e-E. Denormalized numbers are numbers where the exponent is 0. They are represented as sign times 0.Mantissa times 21-E. Denormalized numbers are used to store the value 0, where the exponent and mantissa are both 0. Floating point numbers can store both +0 and -0, depending on the sign. When the number isn't normalized or denormalized (it's exponent is all 1s) the number will be plus or minus infinity if the mantissa is zero and depending on the sign, or plus or minus NaN (Not a Number) if the mantissa isn't zero and depending on the sign.

For instance the binary representation of the number 5.0 (using float type) is:

0 10000001 01000000000000000000000

The first bit is 0, meaning the number is positive, the exponent is 129-127=2, and the mantissa is 1.01 (note the leading one is not included in the binary representation). 1.01 corresponds to 1.25 in decimal representation. Hence 1.25*4=5.

Floating point numbers are not always exact representations of values. a number like 1010110110001110101001101 couldn't be represented by a single precision floating point number because, disregarding the leading 1 which isn't part of the mantissa, there are 24 bits, and a single precision float can only store 23 numbers in it's mantissa, so the 1 at the end would have to be dropped because it is the least significant bit. Also, there are some value which simply cannot be represented in binary which can be easily represented in decimal, E.g. 0.3 in decimal would be 0.0010011001100110011... or something. A lot of other numbers cannot be exactly represented by a binary floating point number, no matter how many bits it use for it's mantissa, just because it would create a repeating pattern like this.

TODO

TODO

  • Add a few comments on different standards...
  • Add some images showing the bit representations like w:IEEE_754 has

Variables

Much like a person has a name that distinguishes him or her from other people, a variable assigns a particular instance of an object type, a name or label by which the instance can be referred to. Depending on its use in the code a variable has also a specific locality in relation to the hardware and based on the structure of the code it also has specific scope where the compiler will recognize it as valid. All these characteristics are defined by a programmer.

Locality (hardware)

Variables have two distinct characteristics: those that are created on the stack (local variables), and those that are accessed via a hard-coded memory address (global variables).

Global variables

Typically a variable is bound to a particular address in computer memory that is automatically assigned to at runtime, with a fixed number of bytes determined by the size of the object type of a variable and any operations performed on the variable effects one or more values stored in that particular memory location.

The only scope that can be defined for a global variable is a namespace, this deals with the visibility of variable not its validity.

All global defined variables will have static lifetime. Only those not defined as const will permit external linkage by default.

Local variables

If the size and location of a variable is unknown beforehand, the location in memory of that variable is stored in another variable instead, and the size of the original variable is determined by the size of the type of the second value storing the memory location of the first. This is called referencing, and the variable holding the other variables memory location is called a pointer.

Scope (code)

Variables also reside in a specific scope. The scope of a variable determines the life-time of a variable. Entrance into a scope begins the life of a variable and leaving scope ends the life of a variable. This becomes important later as the constructors of variables are called when entering scope and the destructors of variables are called when leaving scope. A variable is visible when in scope unless it is hidden by a variable with the same name inside an enclosed scope. A variable can be in global scope, namespace scope, file scope or block scope.

Definition vs. Declaration

There is an important concept, the distinction between the declaration of a variable and its definition. The declaration announces the properties (the type, size, etc.), on the other hand the definition causes storage to be allocated in accordance to the declaration.

Types

Just as there are different types of values (integer, character, etc.), there are different types of variables. A variable can refer to simple values like integers and strings called a primitive type or to a set of values called a composite type that are made up of primitive types and other composite types. Types consist of a set of valid values and a set of valid operations which can be performed on these values. A variable must declare what type it is before it can be used in order to enforce value and operation safety and to know how much space is needed to store a value.

Major functions that type systems provide are:

  • Safety - types make it impossible to code some operations which cannot be valid in a certain context. This mechanism effectively catches the majority of common mistakes made by programmers. For example, an expression "Hello, Wikipedia"/1 is invalid because a string literal cannot be divided by an integer in the usual sense. As discussed below, strong typing offers more safety, but it does not necessarily guarantee complete safety (see type-safety for more information).
  • Optimization - static type checking might provide useful information to a compiler. For example, if a type says a value is aligned at a multiple of 4, the memory access can be optimized.
  • Documentation - using types in languages also improves documentation of code. For example, the declaration of a variable as being of a specific type documents how the variable is used. In fact, many languages allow programmers to define semantic types derived from primitive types; either composed of elements of one or more primitive types, or simply as aliases for names of primitive types.
  • Abstraction - types allow programmers to think about programs in higher level, not bothering with low-level implementation. For example, programmers can think of strings as values instead of a mere array of bytes.
  • Modularity - types allow programmers to express the interface between two subsystems. This localizes the definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.

Table of Data Types

Type Size in Bits Comments Alternate Names
Primitive Types
char ≥ 8
  • sizeof gives the size in units of chars. These "C++ bytes" need not be 8-bit bytes (though commonly they are); the number of bits is given by the CHAR_BIT macro in the climits header.
  • Signedness is implementation-defined.
  • Any encoding of 8 bits or less (e.g. ASCII) can be used to store characters.
  • Integer operations can be performed portably only for the range 0 ~ 127.
  • All bits contribute to the value of the char, i.e. there are no "holes" or "padding" bits.
signed char same as char
  • Characters stored like for type char.
  • Can store integers in the range -127 ~ 127 portably[1].
unsigned char same as char
  • Characters stored like for type char.
  • Can store integers in the range 0 ~ 255 portably.
short ≥ 16, ≥ size of char
  • Can store integers in the range -32767 ~ 32767 portably[2].
  • Used to reduce memory usage (although the resulting executable may be larger and probably slower as compared to using int.
short int, signed short, signed short int
unsigned short same as short
  • Can store integers in the range 0 ~ 65535 portably.
  • Used to reduce memory usage (although the resulting executable may be larger and probably slower as compared to using int.
unsigned short int
int ≥ 16, ≥ size of short
  • Represents the "normal" size of data the processor deals with (the word-size); this is the integral data-type used normally.
  • Can store integers in the range -32767 ~ 32767 portably[2].
signed, signed int
unsigned int same as int
  • Can store integers in the range 0 ~ 65535 portably.
unsigned
long ≥ 32, ≥ size of int
  • Can store integers in the range -2147483647 ~ 2147483647 portably[3].
long int, signed long, signed long int
unsigned long same as long
  • Can store integers in the range 0 ~ 4294967295 portably.
unsigned long int
bool ≥ size of char, ≤ size of long
  • Can store the constants true and false.
wchar_t ≥ size of char, ≤ size of long
  • Signedness is implementation-defined.
  • Can store "wide" (multi-byte) characters, which include those stored in a char and probably many more, depending on the implementation.
  • Integer operations are better not performed with wchar_ts. Use int or unsigned int instead.
float ≥ size of char
  • Used to reduce memory usage when the values used do not vary widely.
  • The floating-point format used is implementation defined and need not be the IEEE single-precision format.
  • unsigned cannot be specified.
double ≥ size of float
  • Represents the "normal" size of data the processor deals with; this is the floating-point data-type used normally.
  • The floating-point format used is implementation defined and need not be the IEEE double-precision format.
  • unsigned cannot be specified.
long double ≥ size of double
  • unsigned cannot be specified.
User Defined Types
struct or class ≥ sum of size of each member
  • Default access modifier for structs for members and base classes is public.
  • For classes the default is private.
  • The convention is to use struct only for POD types.
  • Said to be a compound type.
union ≥ size of the largest member
  • Default access modifier for members and base classes is public.
  • Said to be a compound type.
enum ≥ size of char
  • Enumerations are a distinct type from ints. ints are not implicitly converted to enums, unlike in C. Also ++/-- cannot be applied to enums unless overloaded.
typedef same as the type being given a name
  • typedef has syntax similar to a storage class like static, register or extern.
template ≥ size of char
Derived Types[4]
type&

(reference)
≥ size of char
  • References (unless optimized out) are usually internally implemented using pointers and hence they do occupy extra space separate from the locations they refer to.
type*

(pointer)
≥ size of char
  • 0 always represents the null pointer (an address where no data can be placed), irrespective of what bit sequence represents the value of a null pointer.
  • Pointers to different types may have different representations, which means they could also be of different sizes. So they are not convertible to one another.
  • Even in an implementation which guarantess all data pointers to be of the same size, function pointers and data pointers are in general incompatible with each other.
  • For functions taking variable number of arguments, the arguments passed must be of appropriate type, so even 0 must be cast to the appropriate type in such function-calls.
type [integer]

(array)
integer × size of type
  • The brackets ([]) follow the identifier name in a declaration.
  • In a declaration which also initializes the array (including a function parameter declaration), the size of the array (the integer) can be omitted.
  • type [] is not the same as type*. Only under some circumstances one can be converted to the other.
type (comma-delimited list of types/declarations)

(function)
  • The parentheses (()) follow the identifier name in a declaration, e.g. a 2-arg function pointer: int (* fptr) (int arg1, int arg2).
  • Functions declared without any storage class are extern.
type aggregate_type::*

(member pointer)
≥ size of char
  • 0 always represents the null pointer (a value which does not point to any member of the aggregate type), irrespective of what bit sequence represents the value of a null pointer.
  • Pointers to different types may have different representations, which means they could also be of different sizes. So they are not convertible to one another.

Table of Data Types Footnotes

[1] -128 can be stored in two's-complement machines (i.e. most machines in existence).
[2] -32768 can be stored in two's-complement machines (i.e. most machines in existence).
[3] -2147483648 can be stored in two's-complement machines (i.e. most machines in existence).
[4] The precedences in a declaration are: [], () (left associative) — Highest
&, *, ::* (right associative) — Lowest

NOTE:
Many compilers also support the (non-standard) long long and unsigned long long data types. These can be expected to be added to the next revision of the C++ Standard (in fact, they are in the current draft for that standard, and have been standard in C since 1999).

Until the C++98 (and C99) standard adoption that defines char as signed, before the type was undefined in regard to the use of signal. This information is important if you are using old compilers or reviewing old code.

standard types

C++ has five basic primitive types called standard types, specified by particular keywords, that store a single value.

The type of a variable determines what kind of values it can store:

  • bool - a boolean value: true; false
  • int - Integer: -5; 10; 100
  • char - a character in some encoding, often something like ASCII, ISO-8859-1 ("Latin 1") or ISO-8859-15: 'a', '=', 'G', '2'.
  • float - floating-point number: 1.25; -2.35*10^23
  • double - double-precision floating-point number: like float but more decimals

NOTE:
A char variable cannot store sequences of characters (strings), such as "C++" ({'C', '+', '+', '\0'}); it takes 4 char variables (including the null-terminator) to hold it. This is a common confusion for beginners. There are several types in C++ that store string values, but we will discuss them later.

The float and double primitive data types are called 'floating point' types and are used to represent real numbers (numbers with decimal places, like 1.435324 and 853.562). Floating point numbers and floating point arithmetic can be very tricky, due to the nature of how a computer calculates floating point numbers.

NOTE:
Don't use floating-point variables where discrete values are needed. Using a float for a loop counter is a great way to shoot yourself in the foot. Always test floating-point numbers as <= or >=, never use an exact comparison (== or !=).

Declaration

C++ is a statically typed language. Hence, any variable cannot be used without specifying its type. This is why the type figures in the declaration. This way the compiler can protect you from trying to store a value of an incompatible type into a variable, e.g. storing a string in an integer variable. Declaring variables before use also allows spelling errors to be easily detected. Consider a variable used in many statements, but misspelled in one of them. Without declarations, the compiler would silently assume that the misspelled variable actually refers to some other variable. With declarations, an "Undeclared Variable" error would be flagged. Another reason for specifying the type of the variable is so the compiler knows how much space in memory must be allocated for this variable.

The simplest variable declarations look like this (the parts in []s are optional):

[specifier(s)] type variable_name [ = initial_value];


To create an integer variable for example, the syntax is

int sum;

where sum is the name you made up for the variable. This kind of statement is called a declaration. It declares sum as a variable of type int, so that sum can store an integer value. Every variable has to be declared before use and it is common practice to declare variables as close as possible to the moment where they are needed. This is unlike languages, such as C, where all declarations must precede all other statements and expressions.

In general, you will want to make up variable names that indicate what you plan to do with the variable. For example, if you saw these variable declarations:

char firstLetter; 
char lastLetter; 
int hour, minute;

you could probably make a good guess at what values would be stored in them. This example also demonstrates the syntax for declaring multiple variables with the same type in the same statement: hour and minute are both integers (int type). Notice how a comma separates the variable names.

int a = 123;
int b (456);

Those lines also declare variables, but this time the variables are initialized to some value. What this means is that not only is space allocated for the variables but the space is also filled with the given value. The two lines illustrate two different but equivalent ways to initialize a variable. The assignment operator '=' in a declaration has a subtle distinction in that it assigns an initial value instead of assigning a new value. The distinction becomes important especially when the values we are dealing with are not of simple types like integers but more complex objects like the input and output streams provided by the iostream class.

The expression used to initialize a variable need not be constant. So the lines:

int sum;
sum = a + b;

can be combined as:

int sum = a + b;

or:

int sum (a + b);

Declare a floating point variable 'f' with an initial value of 1.5:

float f = 1.5 ;

Floating point constants should always have a '.' (decimal point) somewhere in them. Any number that does not have a decimal point is interpreted as an integer, which then must be converted to a floating point value before it is used.

For example:

double a = 5 / 2;

will not set a to 2.5 because 5 and 2 are integers and integer arithmetic will apply for the division, cutting off the fractional part. A correct way to do this would be:

double a = 5.0 / 2.0;

You can also declare floating point values using scientific notation. The constant .05 in scientific notation would be 5 \times 10^{-2}. The syntax for this is the base, followed by an e, followed by the exponent. For example, to use .05 as a scientific notation constant:

double a = 5e-2;

NOTE:
Single letters can sometimes be a bad choice for variable names when their purpose cannot be determined. However, some single-letter variable names are so commonly used that they're generally understood. For example i, j, and k are commonly used for loop variables and iterators; n is commonly used to represent the number of some elements or other counts; s, and t are commonly used for strings (that don't have any other meaning associated with them, as in utility routines); c and d are commonly used for characters; and x and y are commonly used for Cartesian co-ordinates.

Below is a program storing two values in integer variables, adding them and displaying the result:

// This program adds two numbers and prints their sum.
#include <iostream.h>
 
int main()
{
  int a = 123;
  int b (456);
  int sum;
 
  sum = a + b;
 
  std::cout << "The sum of " << a << " and " << b << " is " << sum << "\n";
 
  return 0;
}

OR, if you like to save some space, the same above statement can be written as:

// This program adds two numbers and prints their sum, variation 1
#include <iostream>
#include <ostream>
 
using namespace std;
 
int main()
{
  int a = 123, b (456), sum = a + b;
 
  cout << "The sum of " << a << " and " << b << " is " << sum << endl;
 
  return 0;
}

typedef

typedef is a languages keyword, used to give a data type a new name. The intent is to make comprehension of source easier. Most of the time this occurs in old external libraries. The Style Conventions section of this book also mentions this keyword.

typedef int Apples;
typedef int Oranges;
Apples coxes;
Oranges jaffa;

NOTE:
You will only need to redeclare a typedef, if you want to redefine the same keyword.

Type Modifiers

There are several modifiers that can be applied to data types to change the range of numbers they can represent.

const

A variable declared with this specifier cannot be changed (as in read only). Either local or class-level variables (scope) may be declared const indicating that you don't intend to change their value after they're initialized. You declare a variable as being constant using the const keyword. Global const variables have static linkage. If you need to use a global constant across multiple files the best option is to use a special header file that can be included across the project.

const unsigned int DAYS_IN_WEEK = 7 ;

declares a positive integer constant, called DAYS_IN_WEEK, with the value 7. Because this value cannot be changed, you must give it a value when you declare it. If you later try to assign another value to a constant variable, the compiler will print an error.

int main(){
  const int i = 10;
 
  i = 3;            // ERROR - we can't change "i"
 
  int &j = i;       // ERROR - we promised not to
                    // change "i" so we can't
                    // create a non-const reference
                    // to it
 
  const int &x = i; // fine - "x" is a const
                    // reference to "i"
 
  return 0;
}

The full meaning of const is more complicated than this; when working through pointers or references, const can be applied to mean that the object pointed (or referred) to will not be changed via that pointer or reference. There may be other names for the object, and it may still be changed using one of those names so long as it was not originally defined as being truly const.

It has an advantage for programmers over #define command because it is understood by the compiler, not just substituted into the program text by the preprocessor, so any error messages can be much more helpful.

With pointer it can get messy...

T const *p;                     // p is a pointer to a const T
T *const p;                     // p is a const pointer to T
T const *const p;               // p is a const pointer to a const T

If the pointer is a local, having a const pointer is useless. The order of T and const can be reversed:

const T *p;

is the same as

T const *p;

NOTE:
const can be used in the declaration of variables (arguments, return values and methods) - some of which we will mention later on.

Using const has several advantages:

To users of the class, it is immediately obvious that the const methods will not modify the object.

  • Many accidental modifications of objects will be caught at compile time.
  • Compilers like const since it allows them to do better optimization.

volatile

A hint to the compiler that a variable's value can be changed externally; therefore the compiler must avoid aggressive optimization on any code that uses the variable.

Unlike in Java, C++'s volatile specifier does not have any meaning in relation to multi-threading. Standard C++ does not include support for multi-threading (though it is a common extension) and so variables needing to be synchronized between threads need a synchronization mechanisms such as mutexes to be employed, keep in mind that volatile implies only safety in the presence of implicit or unpredictable actions by the same thread (or by a signal handler in the case of a volatile sigatomic_t object). Accesses to mutable volatile variables and fields are viewed as synchronization operations by most compilers and can affect control flow and thus determine whether or not other shared variables are accessed, this implies that in general ordinary memory operations cannot be reordered with respect to a mutable volatile access. This also means that mutable volatile accesses are sequentially consistent. This is not (as yet) part of the standard, it is under discussion and should be avoided until it gets defined.

mutable

This specifier may only be applied to a non-static, non-const member variables. It allows the variable to be modified within const member functions.

mutable is usually used when an object might be logically constant, i.e, no outside observable behavior changes, but not bitwise const, i.e. some internal member might change state.

The canonical example is the proxy pattern. Suppose you have created an image catalog application that shows all images in a long, scrolling list. This list could be modeled as:

class image {
 public:
   // construct an image by loading from disk
   image(const char* const filename); 
 
   // get the image data
   char const * data() const;
 private:
   // The image data
   char* m_data;
}
 
class scrolling_images {
   image const* images[1000];
};

Note that for the image class, bitwise const and logically const is the same: If m_data changes, the public function data() returns different output.

At a given time, most of those images will not be shown, and might never be needed. To avoid having the user wait for a lot of data being loaded which might never be needed, the proxy pattern might be invoked:

class image_proxy {
  public:
   image_proxy( char const * const filename )
      : m_filename( filename ),
        m_image( 0 ) 
   {}
   ~image_proxy() { delete m_image; }
   char const * data() const {
      if ( !m_image ) {
         m_image = new image( m_filename );
      }
      return m_image->data();
   }
  private:
   char const* m_filename;
   mutable image* m_image;
};
 
class scrolling_images {
   image_proxy const* images[1000];
};

Note that the image_proxy does not change observable state when data() is invoked: it is logically constant. However, it is not bitwise constant since m_image changes the first time data() is invoked. This is made possible by declaring m_image mutable. If it had not been declared mutable, the image_proxy::data() would not compile, since m_image is assigned to within a constant function.

NOTE:
Like exceptions to most rules, the mutable keyword exists for a reason, but should not be overused. If you find that you have marked a significant number of the member variables in your class as mutable you should probably consider whether or not the design really makes sense.

short

The short specifier can be applied to the int data type. It can decrease the number of bytes used by the variable, which decreases the range of numbers that the variable can represent. Typically, a short int is half the size of a regular int -- but this will be different depending on the compiler and the system that you use. When you use the short specifier, the int type is implicit. For example:

short a;

is equivalent to:

short int a;

NOTE:
Although short variables may take up less memory, they can be slower than regular int types on some systems. Because most machines have plenty of memory today, it is rare that using a short int is advantageous.

long

The long specifier can be applied to the int and double data types. It can increase the number of bytes used by the variable, which increases the range of numbers that the variable can represent. A long int is typically twice the size of an int, and a long double can represent larger numbers more precisely. When you use long by itself, the int type is implied. For example:

long a;

is equivalent to:

long int a;

The shorter form, with the int implied rather than stated, is more idiomatic (i.e., seems more natural to experienced C++ programmers).

Use the long specifier when you need to store larger numbers in your variables. Be aware, however, that on some compilers and systems the long specifier may not increase the size of a variable. Indeed, most common 32-bit platforms (and one 64-bit platform) use 32 bits for int and also 32 bits for long int.

NOTE:
C++ does not yet allow long long int like modern C does, though it is likely to be added in a future C++ revision, and then would be guaranteed to be at least a 64-bit type. Most C++ implementations today offer long long or an equivalent as an extension to standard C++.

unsigned

The unsigned specifier makes a variable only represent positive numbers and zero. It can be applied only to the char, short,int and long types. For example, if an int typically holds values from -32768 to 32767, an unsigned int will hold values from 0 to 65535. You can use this specifier when you know that your variable will never need to be negative. For example, if you declared a variable 'myHeight' to hold your height, you could make it unsigned because you know that you would never be negative inches tall.

NOTE:
unsigned types use modular arithmetic. The default overflow behavior is to wrap around, instead of raising an exception or saturating. This can be useful, but can also be a source of bugs to the unwary.

signed

The signed specifier makes a variable represent both positive and negative numbers. It can be applied only to the char, int and long data types. The signed specifier is applied by default for int and long, so you typically will never use it in your code.

NOTE:
Plain char is a distinct type from both signed char and unsigned char although it has the same range and representation as one or the other. On some platforms plain char can hold negative values, on others it cannot. char should be used to represent a character; for a small integral type, use signed char, or for a small type supporting modular arithmetic use unsigned char.

static

Using the static modifier makes a variable have static lifetime and on global variables makes them require internal linkage (variables will not be accessible form code of the same project that resides in other files).

static lifetime
Means that a static variable will needs to be initialized in the file scope and at run time, will exist and maintain changes across until the program's process is closed, the particular order of destruction of static variables is undefined.

The static keyword can also be used on functions, inside functions, on classes, on classes members (data and functions), in structs, unions (but not in a union's member) we will cover each use separately.

Enumerated data types

In programming it is often necessary to deal with data types that describe a fixed set of alternatives. For example, when designing a program to play a card game it is necessary to keep trak of the suit of an individual card.

One method for doing this may be to create unique constants to keep track of the suit. For example one could define

const int Clubs=0;
const int Diamonds=1;
const int Hearts=2;
const int Spades=3;
.
.
.
int current_card_suit=Diamonds;

Unfortunately there are several problems with this method. The most minor problem is that this can be a bit cumbersome to write. A more serious problem is that this data is indistinguishable from integers. It becomes very easy to start using the associated numbers instead of the suits themselves. Such as

int current_card_suit=1;

and worse to make mistakes that may be very difficult to catch such as a typo

current_card_suit=11;

which produces a valid experession in C++, but would be meaningless in representing the card's suit.

One way around these difficulty is to create a new data type specifically designed to keep track of the suit of the card, and resticts you to to only use valid possibilities. We can accomplish this using an enumerated data type using the C++ "enum" keyword. In this case we could create the desired data type with the code

enum card_suit {Clubs,Diamonds,Hearts,Spades};
card_suit first_cards_suit=Diamonds;
card_suit second_cards_suit=Hearts;
card_suit third_cards_suit=0; //Would cause an error, 0 is an "integer" not a "card_suit" 
card_suit forth_cards_suit=first_cards_suit; //OK, they both have the same type.

The line of code creates a new data type "card_suit" that may take on only one of four possible values: "Clubs", "Diamonds", "Hearts", and "Spades". In general the enum command takes the form

enum new_type_name {possible_value_1, possible_value_1, ..., possible_value_n'} Optional_Variable_With_This_Type;

While the second line of code creates a new varible with this data type and initializes it to value to "Diamonds". The other lines create new variables of this new type and show some initializations that are (and are not) possible.

Internally enumerated types are stored as integers, that begin with 0 and increment by 1 for each new possible value for the data type.

enum apples {Fuji,Macintosh,GrannySmith};
enum oranges {Blood,Navel,Persian};
apples pie_filling=Navel; //error can't make an apple pie with oranges.
apples my_fav_apple=Macintosh;
oranges my_fav_orange=Navel; //This has the same internal integer value as my_favorite_apple
if(my_fav_apple==my_fav_orange) //Many compilers will produce an error or warning letting you know your comparing two different quantities.
  std::cout << "You shouldn't compare apples and oranges" << std::endl;

While enumerated types are not integers, they are in some case converted into integers. For example, when we try to send an enumerated type to standard output. For example

enum color {Red, Green, Blue};
color hair=Red;
color eyes=Blue;
color skin=Green;
std::cout << "My hair color is " << hair << std::endl;
std::cout << "My eye color is " << eyes << std::endl;
std::cout << "My skin color is " << skin << std::endl;
if (skin==Green)
  std::cout << "I am seasick!" << std::endl;

Will produce the output

My hair color is 0
My eye color is 2
My skin color is 1
I am seasick!

We could improve this example by introducing an array that holds the names of our enumerated type such as

std::string color_names[3]={"Red", "Green", "Blue"};
enum color {Red, Green, Blue};
color hair=Red;
color eyes=Blue;
color skin=Green;
std::cout << "My hair color is " << color_names[hair] << std::endl;
std::cout << "My eye color is " << color_names[eyes] << std::endl;
std::cout << "My skin color is " << color_names[skin] << std::endl;

In this case hair is automatically converted to an integer when it is index arrays. This technique is intimately tide to the fact that the color Red is internally stored as "0", Green is internally stored as "1", and Blue is internally stored as "2". Be Careful! One may override these default choices for the internal values of the enumerated types.

This is done by simply setting the value in the "enum" such as:

enum color {Red=2, Green=4, Blue=6};

In fact it is not necessary to an integer for every value of an enumerated type. In the case the value, the complier will simply increase the value of the previous possible value by one. Consider the following example,

enum colour {Red=2, Green, Blue=6, Orange};

Here the internal value of "Red" is 2, "Green" is 3, "Blue" is 6 and "Orange is 7. Be careful to keep in mind when using this that the internal values do not need to be unique.

Enumerated types are also automatically converted into integers in arithmetic expressions. Which makes it useful to be able to choose particular integers for the internal representations of an enumerated type.

One may have enumerated for the width and height of a standard computer screen. This may allow a program to do meaningful calculations, while still maintaining the benefits of an enumerated type.

enum screen_width {SMALL=800, MEDIUM=1280};
enum screen_height {SMALL=600, MEDIUM=768};
screen_width MyScreenW=SMALL;
screen_height MyScreenH=SMALL;
std::cout << "The number of pixels on my screen is " << MyScreenW*MyScreenH << std::endl;

It should be noted that the internal values used in an enumerated type are constant, and cannot be changed during the execution of the program.

It is perhaps useful to notice that while the enumerated types can be converted to integers for the purpose arithmetic, they cannot be iterated through.

For example

enum month { JANUARY=1, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER};
 
for( month cur_month = JANUARY; cur_month <= DECEMBER; cur_month=cur_month+1)
{
  std::cout << cur_month << std::endl;
}

will fail to compile. The problem is with the for loop. The first two statements in the loop are fine. We may certainly create a new month variable and initialize it. We may also compare two months, where they will be compared as integers. We may not increment the cur_month variable. "cur_month+1" evaluates to an integer which may not be stored into a "month" data type.

In the code above we might try to fix this by replace the for loop with:

for( int monthcount = JANUARY; monthcount <= DECEMBER; monthcount++)
{
  std::cout << cur_month << std::endl;
}

and this will work because we can increment the integer "mounthcount".

Derived Types

TODO

TODO
Complete... ref to:
C++ Programming/Operators/Pointers
C++ Programming/Operators/Arrays

Scope

As with any other type of language, context (i.e. what is the background of a given action or statement) has a high impact on its validity. The same is true in a programming language. For instance, variables have a finite lifetime when your program executes. The scope of an object or variable is simply that part of a program in which the variable name exists or is visible to the compiler.

We will see that in a program we have various constructs, may they be objects, variables or any other such, and they come into existence from the point were you declare them (before they are declared they are unknown) and then, at some point, they are destroyed (as we will see there are many reasons to be so) and all are destroyed when your program terminates.

As an example, in the following fragment of code, the variable 'i' is in scope only in the lines between the appropriate comments:

{
  int i; /*'i' is now in scope */
  i = 5;
  i = i + 1;
  cout << i;
}/* 'i' is now no longer in scope */

The concept of scope is straightforward unless procedures are included in the mix; then it becomes more difficult to follow:

// Confusing Scope Program
#include <iostream>
 
using namespace std;
 
int i = 5;           /* The first version of the variable 'i' is now in scope */
 
void p(){
  int i = -1;        /* A ''new'' variable, also called 'i' has come into existence. The 'i' declared above is now out of scope,*/
  i = i + 1;
  cout << i << ' ';  /* so this ''local'' 'i' will print out the value 0.*/
}                    /* The newest variable 'i' is now out of scope. The first variable, also called 'i', is in scope again now.*/
 
 
main(){
  cout << i << ' ';  /* The first variable 'i' is still in scope here so a ''5'' will be output here.*/
  char ch;
  int i = 6;         /* A ''new'' variable, also called 'i' has come into existence. The first variable 'i' is now out of scope again,*/
  i = i + 1;
  p();
  cout << i << endl; /* so this line will print out a ''7''.*/
}                    /* End of program: all variables are, of course, now out of scope.*/

The first variable 'i' is put out of scope in two separate sections. Thus the repeated statement cout << i << ' '; means something very different each time it is written. The 'i' referred to is a different location in memory on each occasion. This is what was meant by `context' in the introduction: the context or background in which the statement is placed of is different each time, so the statement does something different in each place.

It is always an error to declare the same variable name twice within the same level of scope, and as a rule should be avoided in general, this was one of the reasons behind the implementation of namespaces.

The default scope is defined as global scope, this is commonly used to defines and use global variables or other global constructs (classes, structure, functions, etc...), this makes them valid and visible to the compiler at all times.

NOTE:

It is considered a good practice, if possible and as a way to reduce complexity and name collisions, to use a namespace scope for hiding the otherwise global elements, without removing their validity.

The purpose of Scope

There are important things to note about the above example. That program is only an example program and unusually convoluted; it is useful to demonstrate the idea of scope and little else. So while it illustrates the concept of scope, it fails to illustrate usefully the purpose of scope.

Some variables are required to store information for an entire program, while other variables are short-term variables which are brought into existence momentarily for a single small purpose and then disposed of, by going out of scope. In the following program, an array of numbers is read in and then a procedure is called that computes the average of the numbers in the array. Within the procedure, in order to move through the array and select the elements in the array in turn, a variable 'i' is created for that one purpose. Contrast the two kinds of variable: the array itself, which is in scope throughout the entire program, and the variable 'i' which is in scope only for a small section of the code, to do its own little job.

// Program Average
#include <iostream>
 
using namespace std;
 
float a[20];                      /* a is now in scope.*/
int length;                       /* length is now in scope.*/
 
float average(){
  float result = 0.0;             /* result is now in scope.*/
 
  for(int i = 0; i < length; i++){ /* i is now in scope.*/
   result += a[i];
  }                               /* i is now out of scope.*/
 
  return result/length;
}                                 /* result is now out of scope.*/
 
int main(){
 length = 0;
 while(cin >> a[length++]){
 };
 length--;
 float av = average();            /* av is now in scope.*/
 cout << av << endl;
}                                 /* All variables now out of scope.*/

Scope and control structures

Within a single procedure it is possible to begin a new level of scoping. In fact it occurs every time a left brace `{' is written and it ends where its matching right brace '}' is written. Thus layers of scope to any depth can be built up as can be seen in the following program. The following program has four levels of scoping. The innermost level is said to be enclosed by the levels around it, thus we talk about inner levels and enclosing levels of scoping. Again this program is for illustration purposes only and does nothing of real value.

// Complicated Scope Program
#include <iostream>
 
using namespace std;  /* outermost level of scope starts here */
 
int i;
 
main(){               /* next level of scope starts here */
  int i;
  i = 5;
 
  {                   /* next level of scope starts here */
    int j,i;
    j = 1;
    i = 0;
 
    {                 /* innermost level of scope of this program starts here */
      int k, i;
      i = -1;
      j = 6;
      k = 2;
    }                 /* innermost level of scope of this program ends here */
 
    cout << j << ' ';
  }                   /* next level of scope ends here */
 
  cout << i << endl;
}                     /* next and outermost levels of scope end here */

The output if this program is 6 5. To understand why, we look first at the simpler situation of 'i', and we see why a five is printed for it, and then next at the more complicated situation with 'j' and why a six is printed for it.

A new variable 'i' is created at each new level of scope. Thus only the first assignment to 'i', where 'i' is assigned the value of five, affects this particular variable 'i': the variable that is printed. That variable does not alter its value after that first assignment and so the final statement prints a five. The other assignments to 'i' are irrelevant to the final print statement.

In contrast, the variable 'j' is created only once, hence the assignment where `j' is assigned the value of six, alters this only existing variable called 'j' even though the variable is declared at an enclosing level of scope. If a program has a scope level inside another, and no variable of the right name is declared at this inner level, then the computer `looks outwards' to the next enclosing level of scope. If there is no variable of that name declared there either, then it will look out further and again further outwards until the variable's declaration is found. (Of course, if a declaration for the variable is never found then the compiler indicates an error, stating that the variable is undeclared and so the program is not compiled.)

Scope using other control structures

Above we stated that a new level of scope started with each left brace `{'. However, it is not common to use just a naked left brace: usually the left brace is associated with an if statement or a while statement or some such. We add these to the above program to make the more usual (but still useless) example program following. The program does compile and it prints a 5 when it runs, but does little of value as a program.

// Complicated Scope Program, variation 1
#include <iostream>  
 
using namespace std;  /* outermost level of scope starts here */
 
int i;
 
main(){               /* next level of scope starts here */
  int i;
  i = 5;
 
  while(i != 5) {     /* next level of scope starts here */
    int j,i;
    j = 1;
    i = 0;
    switch (i) {      /* next level of scope starts here */
      int i;
 
      case 1:
        if (i != 4) { /* innermost level of scope of this program starts here */
          int k, i;
          i = -1;
          j = 6;
          k = 2;
        }             /* innermost level of scope of this program ends here */
        break;
 
      case 2: 
        j = 5;
        break;
    }                 /* next level of scope ends here */
 
    cout << j << ' ';
  }                   /* next level of scope ends here */
 
  cout << i << endl;
}                     /* next and outermost levels of scope end here */

We added an extra level of scoping at the switch statement because this statement demands a left brace, and it shows that even at that point a new level of scope is opened (because we were able to declare yet another variable called `i' without getting an error). As you can see, the various control structures (i.e. the while, if and switch statements) all end at their own matching right brace and hence at the same point as the scope ends. It is fair to say that each of these control statements operates over an area of scope. But remember, scope is a concept about variable names and the area in which they are defined, not control structures.

It is not essential that either the if or the while statements have an opening brace following, and program average above shows an example of this with its while statement. In this situation, no new level of scope is begun. It is the left brace that opens the new level of scope, not the if or while statement itself. The switch statement demands a left brace at that point, but notice that the `i' switch variable is at the old level of scope. The new level of scope comes into existence immediately following the left brace as always.

For all practical purposes, the for loop control structure can be seen to have scoping that operates in the same manner. However it is permissible to declare the for loop variable within the for statement itself, as can be seen in line eight of program average above. This is good programming practice as it makes the structure of the program clearer.

With all the programs in this section the statements start further and further to the right as the level of scope deepens. This is referred to as indentation and is a very important feature of program presentation. It makes the scope level explicit at all times and also makes it clear where a statement ends. For example, in the above program the line cout << j << ' '; is within the while loop whereas cout << i << endl; is outside that loop. The loop and the scope level end at the same point: the right brace between these two statements signals the end of both of these.

The scoping of the for control statement in detail

This subsection is probably more confusing than useful, but it is offered for completeness; feel free to skip it.

The for control statement has an unusual scoping, in that the left round bracket also starts its own level of scope. Thus the following program is legal:

// Complicated Scope Program, variation 2
#include <iostream>
 
using namespace std; /* outermost level of scope starts here */
 
int i;
 
main(){              /* next level of scope starts here */
  int i;
  i = 5;
 
  for(               /* next level of scope starts here */
      int i = 1;
      i<10 && cout << i << ' ';
      ++i )  
  {                  /* next level of scope starts here */
    int i = -1;
    cout << i << ' ';
  }                  /* two levels of scope end here*/
 
  cout << i << endl;
}                    /* next and outermost levels of scope end here */

and it gives the output:

 1 -1 2 -1 3 -1 4 -1 5 -1 6 -1 7 -1 8 -1 9 -1 5

This special feature of the for statement is not shared by, for example, the while statement. It is a syntax error to attempt to declare a variable within the while statement, so while(int i < 22)i++; gives a syntax error.

This special scope level enables one to declare a for loop variable within the for loop itself (like the variable 'i' in the above program), instead of having to declare it in the enclosing scope level, creating a neater program. But it is a little peculiar.

The above program does show one interesting feature: in order to check if a new level of scope has been opened, it is only necessary to attempt to declare an existing variable again at that level, as was done with the variable `i' in the above example program, which has a new variable `i' declared at every level possible.

The above program also illustrates one further very important point: there is a saying in computer programming that is it possible to write bad code in any language. The above program is quite unclear in its operation and a good example of bad coding: it is good code to illustrate a point, but bad code to read. All code should be written to make the program as clear as possible as discussed elsewhere under the topic of program style.

Scope and lifetime

The scope of a variable should be contrasted with its lifetime. In the program above called `Confusing Scope Program' the first variable `i' goes out of scope for a time but it remains in existence, thus its lifetime is continuing while it is out of scope. In older programming languages, it is difficult to contrive examples where scope and lifetime are different - in general in these older languages the two are the same, so lifetime equals scope. Not only is it difficult, it is not all that useful in the general case when it does occur. In recently created computer languages like C++ however, the idea of having variables that are out of scope but still alive is heavily exploited and creates the principle distinguishing feature of C++: the C++ class. This is the above program re-written with a class:

// Program Average rewritten using a class
#include <iostream>
 
using namespace std;
 
class StatisticsPackage{
private:
  float aa[20];                     /* aa scope start*/
  int length;                       /* length scope start*/
public:                           
  float average(){
    float result = 0.0;             /* result scope start*/
 
    for(int i = 0; i < length; ++i) /* i scope start*/
      result += aa[i];
 
  return result/length;
  }                                 /* result and i scope end*/
 
  void get_data(){
    length = 0;
    while(cin >> aa[length++]);
    --length;
  }
};                                  /* aa and length scope end*/
 
main(){
  StatisticsPackage sp;             /* aa and length lifetimes start */
  sp.get_data();
  float av = sp.average();          /* av scope start*/
  cout << av << endl;
}                                   /* av scope end*/

In this version of the program, the variables `length' and `aa' are alive after the class `sp' comes into existence. However, their scope has been limited: they are alive but out of scope, storing the information but not being directly available in the main program. Keeping variables out of scope in this manner is very helpful in debugging a program, by narrowing the number of lines in which the variable in question can possibly change its value.

TODO

TODO
evolve and add references to the further insight and practical usefulness in sections like class space.

Namespace

The keyword as used on the several code examples examined before, this section will give an explanation to the line using namespace std;.

In many programming languages, a namespace is a context for identifiers. C++ can handle multiple namespaces within the language. By using namespace (or the using namespace keyword), one is offered a clean way to aggregate code under a shared label, so as to prevent naming collisions or just to ease recall and use of very specific scopes. There are other "name spaces" besides "namespaces"; this can be confusing.

Name spaces (note the space there), as we will see, go beyond the concept of scope by providing an easy way to differentiate what is being called/used. As we will see, classes are also name spaces, but they are not namespaces.

NOTE:
Use namespace only for convenience or real need, like aggregation of related code, don't use it in a way to make code overcomplicated for you and others

A namespace is defined with a namespace block.

namespace foo {
  int bar;
}

Within this block, identifiers can be used exactly as they are declared. Outside of this block, the namespace specifier must be prefixed (that is, it must be qualified). For example, outside of namespace foo, bar must be written foo::bar. C++ includes another construct which makes this verbosity unnecessary. By adding the line using namespace foo; to a piece of code, the prefix foo:: is no longer needed.

using namespace std;

This using-directive indicates that any names used but not declared within the program should be sought in the ‘standard (std) namespace’.

NOTE:
It is always a bad idea to use a using directive in a header file, as it affects every use of that header file and would make difficult its use in other derived projects; there is no way to "undo" or restrict the use of that directive. Also don't use it before an #include directive.

To make a single name from a namespace available, the following using-declaration exists:

using foo::bar;

After this declaration, the name bar can be used inside the current namespace instead of the more verbose version foo::bar. Note that programmers often use the terms declaration and directive interchangeably, despite their technically different meanings.

It is good practice to use the narrow second form (using declaration), because the broad first form (using directive) might make more names available than desired. Example:

namespace foo {
  int bar;
  double pi;
}
 
using namespace foo;
 
int* pi;
pi = &bar;  // ambiguity: pi or foo::pi?

In that case the declaration using foo::bar; would have made only foo::bar available, avoiding the clash of pi and foo::pi. This problem (the collision of identically-named variables or functions) is called "namespace pollution" and as a rule should be avoided wherever possible.

using-declarations can appear in a lot of different places. Among them are:

  • namespaces (including the default namespace)
  • functions

A using-declaration makes the name (or namespace) available in the scope of the declaration. Example:

namespace foo {
  namespace bar {
   double pi;
  }
 
  using bar::pi;
  // bar::pi can be abbreviated as pi
}
 
// here, pi is no longer an abbreviation. Instead, foo::bar::pi must be used.

Namespaces are hierarchical. Within the hypothetical namespace food::fruit, the identifier orange refers to food::fruit::orange if it exists, or if not, then food::orange if that exists. If neither exist, orange refers to an identifier in the default namespace.

Code that is not explicitly declared within a namespace is considered to be in the default namespace.

Another property of namespaces is that they are open. Once a namespace is declared, it can be redeclared (reopened) and namespace members can be added. Example:

namespace foo {
  int bar;
}
 
// ...
 
namespace foo {
  double pi;
}

Namespaces are most often used to avoid naming collisions. Although namespaces are used extensively in recent C++ code, most older code does not use this facility. For example, the entire standard library is defined within namespace std, and in earlier standards of the language, in the default namespace.

For a long namespace name, a shorter alias can be defined (a namespace alias declaration). Example:

namespace ultra_cool_library_for_image_processing_version_1_0 {
  int foo;
}
 
namespace improc1 = ultra_cool_library_for_image_processing_version_1_0;
// from here, the above foo can be accessed as improc1::foo

There exists a special namespace: the unnamed namespace. This namespace is used for names which are private to a particular source file or other namespace:

namespace {
  int some_private_variable;
}
// can use some_private_variable here

In the surrounding scope, members of an unnamed namespace can be accessed without qualifying, i.e. without prefixing with the namespace name and :: (since the namespace doesn't have a name). If the surrounding scope is a namespace, members can be treated and accessed as a member of it. However, if the surrounding scope is a file, members cannot be accessed from any other source file, as there is no way to name the file as a scope. An anonymous namespace declaration is semantically equivalent to the following construct

namespace $$$ {
  // ...
}
using namespace $$$;

where $$$ is a unique identifier manufactured by the compiler.

As you can nest an anonymous namespace in an ordinary namespace, and vice versa, you can also nest two anonymous namespaces.

namespace {
 
  namespace {
    // ok
  }
 
}

NOTE:
If you enable the use of a namespace in the code, all the code will use it (you can't define sections that will and exclude others), you can however use nested namespace declarations to restrict its scope.

Because of space considerations, we cannot actually show the namespace command being used properly: it would require a very large program to show it working usefully. However, we can illustrate the concept itself easily.

// Namespaces Program, an example to illustrate the use of namespaces
#include <iostream>
 
namespace first {
  int first1;
  int x;
}
 
namespace second {
  int second1;
  int x;
}
 
namespace first {
  int first2;
}
 
main(){
  //first1 = 1;
  first::first1 = 1;
  using namespace first;
  first1 = 1;
  x = 1;
  second::x = 1;
  using namespace second;
 
  //x = 1;
  first::x = 1;
  second::x = 1;
  first2 = 1;
 
  //cout << 'X';
  std::cout << 'X';
  using namespace std;
  cout << 'X';
}

We will examine the code moving from the start down to the end of the program, examining fragments of it in turn.

#include <iostream>

This just includes the iostream library so that we can use std::cout to print stuff to the screen.

namespace first {
  int first1;
  int x;
}
 
namespace second {
  int second1;
  int x;
}
 
namespace first {
  int first2;
}

We create a namespace called first and add to it two variables, first1 and x. Then we close it. Then we create a new namespace called second and put two variables in it: second1 and x. Then we re-open the namespace first and add another variable called first2 to it. A namespace can be re-opened in this manner as often as desired to add in extra names.

  main(){
1  //first1 = 1;
2  first::first1 = 1;

The first line of the main program is commented out because it would cause an error. In order to get at a name from the first namespace, we must qualify the variable's name with the name of its namespace before it and two colons; hence the second line of the main program is not a syntax error. The name of the variable is in scope: it just has to be referred to in that particular way before it can be used at this point. This therefore cuts up the list of global names into groups, each group with its own prefixing name.

3  using namespace first;
4  first1 = 1;
5  x = 1;
6  second::x = 1;

The third line of the main program introduces the using namespace command. This commands pulls all the names in the first namespace into scope. They can then be used in the usual way from there on. Hence the fourth and fifth lines of the program compile without error. In particular, the variable x is available now: in order to address the other variable x in the second namespace, we would call it second::x as shown in line six. Thus the two variables called x can be separately referred to, as they are on the fifth and sixth lines.

7  using namespace second;
8  //x = 1;
9  first::x = 1;
10 second::x = 1;

We then pull the declarations in the namespace called second in, again with the using namespace command. The line following is commented out because it is now an error (whereas before it was correct). Since both namespaces have been brought into the global list of names, the variable x is now ambiguous, and needs to be talked about only in the qualified manner illustrated in the ninth and tenth lines.

11 first2 = 1;

The eleventh line of the main program shows that even though first2 was declared in a separate section of the namespace called first, it has the same status as the other variables in namespace first. A namespace can be re-opened as many times as you wish. The usual rules of scoping apply, of course: it is not legal to try to declare the same name twice in the same namespace.

12 //cout << 'X';
13 std::cout << 'X';
14 using namespace std;
15 cout << 'X';
}

There is a namespace defined in the computer in special group of files. Its name is std and all the system-supplied names, such as cout, are declared in that namespace in a number of different files: it is a very large namespace. Note that the #include statement at the very top of the program does not fully bring the namespace in: the names are there but must still be referred to in qualified form. Line twelve has to be commented out because currently the system-supplied names like cout are not available, except in the qualified form std::cout as can be seen in line thirteen. Thus we need a line like the fourteenth line: after that line is written, all the system-supplied names are available, as illustrated in the last line of the program. At this point we have the names of three namespaces incorporated into the program.

As the example program illustrates, the declarations that are needed are brought in as desired, and the unwanted ones are left out, and can be brought in in a controlled manner using the qualified form with the double colons. This gives the greater control of names needed for large programs. In the example above, we used only the names of variables. However, namespaces also control, equally, the names of procedures and classes, as desired.

The Compiler

A compiler is a program that translates a computer program written in one computer language (the source code) into an equivalent program written in the computer's native machine language. This process of translation is called compilation.

Compilation

The compilation output of a compiler is the result from translating or compiling a program. The most important part of the output is saved to a file called an object file. As we have seen before in the The Code Section of the book, it consists of the transformation of source files into object files.

NOTE:
Some files may be created/needed for a successful compilation, that data isn't part of the C++ language or may result from the compilation of external code (an example would be a library), this may depend on the specific compiler you use (MS Visual Studio for example adds several extra files to a project), in that case you should check the documentation or it can part of a specific framework that needs to be accessed. Be aware that some of this constructs may limit the portability of the code.

The instructions of this compiled program can then be run (executed) by the computer if the object file is in an executable format. Often, however, there are additional steps that may be required to create an executable program: preprocessing and linking.

Compile Time

Defines the time and operations performed by a compiler (ie, compile-time operations) during a build (creation) of a program (executable or not).

The operations performed at compile time usually include lexical analysis, syntax analysis, various kinds of semantic analysis (eg, type checks, and instantiation of template) and code generation.

The definition of a programming language will specify compile time requirements that source code must meet to be successfully compiled.

Compile time occurs before link time (when the output of one or more compiled files are joined together) and runtime (when a program is executed). In some programming languages it may be necessary for some compilation and linking to occur at runtime. The concept of runtime will be introduced later.

TODO

TODO
Add run time concept, and mention it here (probably on Debugging)

Lexical analysis

This is alternatively known as scanning or tokenisation. It happens before syntax analysis and converts the code into tokens, which are the parts of the code that the program will actually use. The source code as expressed as characters (arranged on lines) into a sequence of special tokens for each reserved keyword, and tokens for data types and identifiers and values. The lexical analyzer is the part of the compiler which removes whitespace and other non compilable characters from the source code. It uses whitespace to separate different tokens, and ignores the whitespace.

To give a simple illustration of the process:

int main()
{
    std::cout << "hello world" << std::endl;
    return 0;
}

Depending on the lexical rules used it might be tokenized as:

1 = string "int"
2 = string "main"
3 = opening parenthesis
4 = closing parenthesis
5 = opening brace
6 = string "std"
7 = namespace operator
8 = string "cout"
9 = << operator
10 = string ""hello world""
11 = string "endl"
12 = semicolon
13 = string "return"
14 = number 0
15 = closing brace

And so for this program the lexical analyzer might send something like:

1 2 3 4 5 6 7 8 9 10 9 6 11 12 13 14 12 15

To the syntactical analyzer, which is talked about next, to be parsed. It is easier for the syntactical analyzer to apply the rules of the language when it can work with numerical values and can distinguish between language syntax (such as the semicolon) and everything else, and knows what data type each thing has.

Syntax Analysis

This step (also called sometimes syntax checking) ensures that the code is valid and will sequence into an executable program. The syntactical analyzer applies rules to the code, checking to make sure that each opening brace has a corresponding closing brace, and that each declaration has a type, and that the type exists, and that.... syntax analysis is more complicated that lexical analysis =). As an example

int main()
{
    std::cout << "hello world" << std::endl;
    return 0;
}

The syntax analyzer would first look at the string "int", check it against defined keywords, and find that it is a type for integers. The analyzer would then look at the next token as an identifier, and check to make sure that it has used a valid identifier name. It would then look at the next token. Because it is an opening parenthesis it will treat "main" as a function, instead of a declaration of a variable if it found a semicolon or the initialization of an integer variable if it found an equals sign. After the opening parenthesis it would find a closing parenthesis, meaning that the function has 0 parameters. Then it would look at the next token and see it was an opening brace, so it would think that this was the implementation of the function main, instead of a declaration of main if the next token had been a semicolon, even though you can't declare main in c++. It would probably create a counter also to keep track of the level of the statement blocks to make sure the braces were in pairs. After that it would look at the next token, and probably not do anything with it, but then it would see the :: operator, and check that "std" was a valid namespace. Then it would see the next token "cout" as the name of an identifier in the namespace "std", and see that it was a template. The analyzer would see the << operator next, and so would check that the << operator could be used with cout, and also that the next token could be used with the << operator. The same thing would happen with the next token after the ""hello world"" token. Then it would get to the "std" token again, look past it to see the :: operator token and check that the namespace existed again, then check to see if "endl" was in the namespace. Then it would see the semicolon and so it would see that as the end of the statement. Next it would see the keyword "return", and then expect an integer value as the next token because main returns an integer, and it would find 0, which is an integer. Then the next symbol is a semicolon so that is the end of the statement. The next token is a closing brace so that is the end of the function. And there are no more tokens, so if the syntax analyzer didn't find any errors with the code, it would send the tokens to the compiler so that the program could be converted to machine language. This is a simple view of syntax analysis, and real syntax analyzers don't really work this way, but the idea is the same.

Here are some keywords which the syntax analyzer will look for to make sure you aren't using any of these as identifier names, or to know what type you are defining your variables as or what function you are using which is included in the c++ language.

Compile Speed

There are several factors that dictate how fast a compilation proceeds, like:

  • Hardware
    • Resources (Slow CPU, low memory and even a slow HDD can have an influence)
  • Software
    • The compiler itself, new is always better, but may depend on how portable you want the project to be.
    • The design selected for the program (structure of object dependencies, includes) will also factor in.

Experience tells that most likely if you are suffering from slow compile times, the program you are trying to compile is poorly designed, take the time to structure your own code to minimize re-compilation after changes. Large projects will always compile slower. Use pre-compiled headers and external header guards. We will discuss ways to reduce compile time in the Optimization Section of this book.

ISO C++ (C++98) Keywords

  • and
  • and_eq
  • asm
  • auto
  • bitand
  • bitor
  • bool
  • break
  • case
  • catch
  • char
  • class
  • compl
  • const
  • const_cast
  • continue
  • default
  • delete
  • do
  • double
  • dynamic_cast
  • else
  • enum
  • explicit
  • export
  • extern
  • false
  • float
  • for
  • friend
  • goto
  • if
  • inline
  • int
  • long
  • mutable
  • namespace
  • new
  • not
  • not_eq
  • operator
  • or
  • or_eq
  • private
  • protected
  • public
  • register
  • reinterpret_cast
  • return
  • short
  • signed
  • sizeof
  • static
  • static_cast
  • struct
  • switch
  • template
  • this
  • throw
  • true
  • try
  • typedef
  • typeid
  • typename
  • union
  • unsigned
  • using
  • virtual
  • void
  • volatile
  • wchar_t
  • while
  • xor
  • xor_eq

Specific compilers may (in a non-standard compliant mode) also treat some other words as keywords, including cdecl, far, fortran, huge, interrupt, near, pascal, typeof. Old compilers may recognize the overload keyword, an anachronism that has been removed from the language.

The next revision of C++, informally known as C++0x for now, is likely to add some keywords, probably including at least:

  • static_assert
  • decltype
  • nullptr

(These are being considered carefully to minimize breakage to existing code; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2105.html for some details.)

Old compilers may not recognize some or all of the following keywords:

  • and
  • and_eq
  • bitand
  • bitor
  • bool
  • catch
  • compl
  • const_cast
  • dynamic_cast
  • explicit
  • export
  • false
  • mutable
  • namespace
  • not
  • not_eq
  • or
  • or_eq
  • reinterpret_cast
  • static_cast
  • template
  • throw
  • true
  • try
  • typeid
  • typename
  • using
  • wchar_t
  • xor
  • xor_eq

C++ Reserved Identifiers

Some "nonstandard" identifiers are reserved for distinct uses, to avoid conflicts on the naming of identifiers by vendors, library creators and users in general.

Reserved identifiers include keywords with two consecutive underscores (__), all that start with an underscore followed by an uppercase letter and some other categories of reserved identifiers carried over from the C library specification.

A list of C reserved identifiers can be found at the Internet Wayback Machine archived page: http://web.archive.org/web/20040209031039/http://oakroadsystems.com/tech/c-predef.htm#ReservedIdentifiers

TODO

TODO
It would be nice to list those C reserved identifiers, for the moment All Standard C Library Functions have already been listed

Compiler Keywords

A limited set of keywords exists to directly control the compiler's behavior, these keywords are very powerful and must be used with care, they may make a huge difference on the program's compile time and running speed.

In C++ Standard, these keywords are called Specifiers.

auto

NOTE:

This functionality is not yet available in the C++ Standard Language.

The auto keyword used to have a different behavior, but in C++0x it will allow one to omit the type of a variable and let the compiler decide. This is particularly useful for generic programming in which the return type of a function may depend on the type of its arguments. Thus, rather than this:

int x = 42;
std::vector<double> numbers;
numbers.push_back(1.0);
numbers.push_back(2.0);
for(std::vector<double>::iterator i = numbers.begin();
    i != numbers.end(); ++i) {
  cout << *i << " ";
}

we could write this:

auto x = 42; // We can use auto on base types...
std::vector<double> numbers;
numbers.push_back(1.0);
numbers.push_back(2.0);
// But auto is most useful for complicated types.
for(auto i = numbers.begin(); i != numbers.end(); ++i) {
  cout << *i << " ";
}

inline

A function declaration with an inline keyword declares an inline function. The inline keyword is used to suggest to the compiler that a particular function be subjected to in-line expansion; that is, it suggests that the compiler insert the complete body of the function in every context where that function is used and so it is used to avoid the overhead implied by making a CPU jump from one place in code to another and back again to execute a subroutine, as is done in naive implementations of subroutines.

Example:

inline swap( int& a, int& b) { int const tmp(b); b=a; a=tmp; }

Marking a function as inline (possibly implicitly, by defining a member function inside a class/struct definition) is a (non-binding) request to the compiler to consider inlining the function, i.e., expanding its code at the call site; it is legal, but redundant, to add the inline keyword in that context, and good style is to omit it.

Example:

struct length
{
  explicit length(int metres) : m_metres(metres) {}
  operator int&() { return m_metres; }
  private:
  int m_metres;
};

Inlining can be an optimization, or a pessimization. It can increase code size (by duplicating the code for a function at multiple call sites) or can decrease it (if the code for the function, after optimization, is less than the size of the code needed to call a non-inline function). It can increase speed (by allowing for more optimization and by avoiding jumps) or can decrease speed (by increasing code size and hence cache misses).

One important side-effect of inlining is that more code is then accessible to the optimizer.

Marking a function as inline also has an effect on linking: multiple definitions of an inline function are permitted (so long as each is in a different translation unit) so long as they are identical. This allows inline function definitions to appear in header files; defining non-inline functions in header files is almost always an error (though function templates can also be defined in header files, and often are).

Mainstream C++ compilers like Microsoft Visual C++ and GCC support an option that lets the compilers automatically inline any suitable function, even those that are not marked as inline functions. A compiler is often in a better position than a human to decide whether a particular function should be inlined; in particular, the compiler may not be willing or able to inline many functions that the human asks it to.

Excessive use of inline functions can greatly increase coupling/dependencies and compilation time, as well as making header files less useful as documentation of interfaces.

extern

The extern keyword tells the compiler that a variable is declared in another source module. The linker then finds this actual declaration and sets up the extern variable to point to the correct location. If a variable is declared extern, and the linker finds no actual declaration of it, it will throw an "Unresolved external symbol" error.

Examples:

extern int i;
declares that there is a variable named i of type int, defined somewhere in the program.
extern int j = 0;
defines a variable j with external linkage; the extern keyword is redundant here.
extern void f();
declares that there is a function f taking no arguments and with no return value defined somewhere in the program; extern is redundant, but sometimes considered good style.
extern void f() {;}
defines the function f() declared above; again, the extern keyword is technically redundant here as external linkage is default.
extern const int k = 1;
defines a constant int k with value 1 and external linkage; extern is required because const variables have internal linkage by default.

Storage Class Specifiers

  • register - A hint to the compiler that the specified variable will be heavily used; therefore the compiler should consider allocating a CPU register to the variable. The compiler may ignore this hint.
  • static - Retains a memory location for all instances of the program or class.

The Preprocessor

The preprocessor is either a separate program invoked by the compiler or part of the compiler itself. It performs intermediate operations that modify the original source code and internal compiler options before the compiler tries to compile the resulting source code.

The instructions that the preprocessor parses are called directives and come in two forms: preprocessor and compiler directives. Preprocessor directives direct the preprocessor on how it should process the source code, and compiler directives direct the compiler on how it should modify internal compiler options. Directives are used to make writing source code easier (by making it more portable, for instance) and to make the source code more understandable. They are also the only valid way to make use of facilities (classes, functions, templates, etc.) provided by the C++ Standard Library.

NOTE:
Check the documentation of your compiler/preprocessor for information on how it implements the preprocessing phase and for any additional features not covered by the standard that may be available. For in depth information on the subject of parsing, you can read "Compiler Construction" (http://en.wikibooks.org/wiki/Compiler_Construction)

All directives start with '#' at the beginning of a line. The standard directives are:

  • #define
  • #elif
  • #else
  • #endif
  • #error
  • #if
  • #ifdef
  • #ifndef
  • #include
  • #line
  • #pragma
  • #undef

Inclusion of Header Files (#include)

The #include directive allows a programmer to include contents of one file inside another file. This is commonly used to separate information needed by more than one part of a program into its own file so that it can be included again and again without having to re-type all the source code into each file.

C++ generally requires you to declare what will be used before using it. So, files called headers usually include declarations of what will be used in order for the compiler to successfully compile source code. This is further explained in the File Organization Section of the book. The standard library (a repository of code that is available with every standards-compliant C++ compiler) and 3rd party libraries make use of headers in order to allow the inclusion of the needed declarations in your source code, allowing you to make use of features or resources that are not part of the language itself.

The first lines in any source file should usually look something like this:

#include <iostream>
#include "other.h"

The above lines cause the contents of the files iostream and other.h to be included for use in your program. Usually this is implemented by just inserting into your program the contents of iostream and other.h. When angle brackets (<>) are used in the directive, the preprocessor is instructed to search for the specified file in a compiler-dependent location. When double quotation marks (" ") are used, the preprocessor is expected to search in some additional, usually user-defined, locations for the header file and to fall back to the standard include paths only if it is not found in those additional locations. Commonly when this form is used, the preprocessor will also search in the same directory as the file containing the #include directive.

The iostream header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example there is an output stream object called std::cout (where "cout" is short for "console output") which is used to output text to the standard output, which usually displays the text on the computer screen.

NOTE:
When including standard libraries, compilers are allowed to make an exception as to whether a header file by a given name actually exists as a physical file or is simply a logical entity that causes the preprocessor to modify the source code, with the same end result as if the entity existed as a physical file. Check the documentation of your preprocessor/compiler for any vendor-specific implementation of the #include directive and for specific search locations of standard and user-defined headers. This can lead to portability problems and confusion.

A list of standard C++ header files is listed below:


Standard Template Library

and the

Standard C Library

Everything inside C++'s standard library is kept in the std:: namespace.

Old compilers may include headers with a .h suffix (e.g. the non-standard <iostream.h> vs. the standard <iostream>) instead of the standard headers. These names were common before the standardization of C++ and some compilers still include these headers for backwards compatibility. Rather than using the std:: namespace, these older headers pollute the global namespace and may otherwise only implement the standard in a limited way.

Some vendors use the SGI STL headers. This was the first implementation of the standard template library.

Non-standard but somewhat common C++ libraries
  1. Apparently studies at Rice University have shown that "4 spaces" is the best indentation size for C programs. [1] Several programmers recommend "use spaces for indentation. Do not use tabs in your code. You should set your editor to emit spaces when you hit the tab key." [2] [3] Other programmers disagree [4] [5]
  2. Streams based on FILE* from stdio.h.
  3. Precursor to iostream. Old stream library mostly included for backwards compatibility even with old compilers.
  4. Uses char* whereas sstream uses string. Prefer the standard library sstream.

NOTE:
Before standardization of the headers, they were presented as separated files, like <iostream.h> and so on. This is probably still a requirement on very old (non-standards-compliant) compilers, but newer compilers will accept both methods. There is also no requirement in the standard that headers should exist in a file form. The old method of referring to standard libraries as separate files is obsolete.

#pragma

The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma directive depends on the software implementation of the standard that is used.

Pragma directives are used within the source program.

#pragma token(s)

You should check the software implementation of the C++ standard you intend to use for a list of the supported tokens.

For example, one of the most widely used preprocessor pragma directives, #pragma once, when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.

NOTE:
Another method exists, commonly referred to as include guards, that provides this same functionality but uses other include directives.

In the GCC documentation, #pragma once has been described as an obsolete preprocessor directive.

Macros

The C++ preprocessor includes facilities for defining "macros", which roughly means the ability to replace a use of a named macro with one or more tokens. This has various uses from defining simple constants (though const is more often used for this in C++), conditional compilation, code generation and more -- macros are a powerful facility, but if used carelessly can also lead to code that is hard to read and harder to debug!

NOTE:

Macros don't depend only on the C++ Standard or your actions. They may exist due to the use of external frameworks, libraries or even due the compiler you are using and the specific OS. We will not cover that information on this book but you may find more information in the Pre-defined C/C++ Compiler Macros page at ( http://predef.sourceforge.net/ ) the project maintains a complete list of macros that are compiler and OS agnostic.

#define and #undef

The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled:

#define USER_MAX (1000)

The #undef directive deletes a current macro definition:

#undef USER_MAX

It is an error to use #define to change the definition of a macro, but it is not an error to use #undef to try to undefine a macro name that is not currently defined. Therefore, if you need to override a previous macro definition, first #undef it, and then use #define to set the new definition.

NOTE:
Because preprocessor definitions are substituted before the compiler acts on the source code, any errors that are introduced by #define are difficult to trace. For example using value or macro names that are the same as some existing identifier can create subtle errors, since the preprocessor will substitute the identifier names in the source code.

Today, for this reason, #define is primarily used to handle compiler and platform differences. E.g, a define might hold a constant which is the appropriate error code for a system call. The use of #define should thus be limited unless absolutely necessary; typedef statements, constant variables, enums, templates and inline functions can often accomplish the same goal more efficiently and safely.

By convention, values defined using #define are named in uppercase with "_" separators, this makes it clear to readers that the values is not alterable and in the case of macros, that the construct requires care. Although doing so is not a requirement, it is considered very bad practice to do otherwise. This allows the values to be easily identified when reading the source code.

Try to use const and inline instead of #define.

\ (line continuation)

If for some reason it is needed to break a given statement into more than one line, use the \ (backslash) symbol to "escape" the line ends. For example,

#define MULTIPLELINEMACRO \
        will use what you write here \
        and here etc...

is equivalent to

#define MULTIPLELINEMACRO will use what you write here and here etc...

because the preprocessor joins lines ending in a backslash ("\") to the line after them. That happens even before directives (such as #define) are processed, so it works for just about all purposes, not just for macro definitions. The backslash is sometimes said to act as an "escape" character for the newline, changing its interpretation.

In some (fairly rare) cases macros can be more readable when split across multiple lines. Good modern C++ code will use macros only sparingly, so the need for multi-line macro definitions won't arise often.

It's certainly possible to overuse this feature. It's quite legal but entirely indefensible, for example, to write

int ma\
in//ma/
()/*ma/
in/*/{}

That's an abuse of the feature though: while an escaped newline can appear in the middle of a token, there should never be any reason to use it there. Don't try to write code that looks like it belongs in the International Obfuscated C Code Competition.

Warning: there is one occasional "gotcha" with using escaped newlines: if there are any invisible characters after the backslash, the lines will not be joined, and there will almost certainly be an error message produced later on, though it might not be at all obvious what caused it.

Function-like Macros

Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:

#define ABSOLUTE_VALUE( x ) ( ((x) < 0) ? -(x) : (x) )
// ...
int x = -1;
while( ABSOLUTE_VALUE( x ) ) {
// ...
}

NOTE:

It's generally a good idea to use extra parentheses for macro parameters, it avoids the parameters from being parsed in a unintended ways. But there are some exceptions to consider:

  1. Since comma operator have lower precedence than any other, this removes the possibility of problems, no need for the extra parentheses.
  2. When concatenating tokens with the ## operator, converting to strings using the # operator, or concatenating adjacent string literals, parameters cannot be individually parenthesized.

Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.

Macros replace each occurrence of the macro parameter used in the text with the literal contents of the macro parameter without any validation checking. Badly written macros can result in code which won't compile or create hard to discover bugs. Because of side-effects it is considered a very bad idea to use macro functions as described above. However as with any rule, there may be cases where macros are the most efficient means to accomplish a particular goal.

int z = -10;
int y = ABSOLUTE_VALUE( z++ );

If ABSOLUTE_VALUE() was a real function 'z' would now have the value of '-9', but because it was an argument in a macro z++ was expanded 3 times (in this case) and thus (in this situation) executed twice, setting z to -8, and y to 9. In similar cases it is very easy to write code which has "undefined behavior", meaning that what it does is completely unpredictable in the eyes of the C++ Standard.

// ABSOLUTE_VALUE( z++ ); expanded
( ((z++) < 0 ) ? -(z++) : (z++) );

and

// An example on how to use a macro correctly
 
#include <iostream>
 
#define SLICES 8
#define PART(x) ( (x) / SLICES ) // Note the extra parentheses around '''x'''
 
int main() {
   int b = 10, c = 6;
 
   int a = PART(b + c);
   std::cout << a;
 
   return 0;
}

-- the result of "a" should be "2" (b + c passed to PART -> ((b + c) / SLICES) -> result is "2")

NOTE:

Variadic Macros
A variadic macro is a feature of the preprocessor whereby a macro is declared to accept a varying number of arguments (similar to a variadic function).

They are currently not part of the C++ programming language, though many recent C++ implementations support variable-argument macros as an extension (ie: GCC, MS Visual Studio C++), and it is expected that variadic macros may be added to C++ at a later date.

Variable-argument macros were introduced in the ISO/IEC 9899:1999 (C99) revision of the C Programming Language standard in 1999.

# and ##

The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example

#define as_string( s ) # s

will make the compiler turn

std::cout << as_string( Hello  World! ) << std::endl;

into

std::cout << "Hello World!" << std::endl;

NOTE:
Observe the leading and trailing whitespace from the argument to # is removed, and consecutive sequences of whitespace between tokens are converted to single spaces.

Using ## concatenates what's before the ## with what's after it; the result must be a well-formed preprocessing token. For example

#define concatenate( x, y ) x ## y
...
int xy = 10;
...

will make the compiler turn

std::cout << concatenate( x, y ) << std::endl;

into

std::cout << xy << std::endl;

which will, of course, display 10 to standard output.

String literals cannot be concatenated using ##, but the good news is that this isn't a problem: just writing two adjacent string literals is enough to make the preprocessor concatenate them.

The dangers of macros

To illustrate the dangers of macros, consider this naive macro

#define MAX(a,b) a>b?a:b

and the code

i = MAX(2,3)+5;
j = MAX(3,2)+5;

Take a look at this and consider what the value after execution might be. The statements are turned into

 
int i = 2>3?2:3+5;
int j = 3>2?3:2+5;

Thus, after execution i=8 and j=3 instead of the expected result of i=j=8! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if a,b contains expressions, the definition must parenthesize every use of a,b in the macro definition, like this:

#define MAX(a,b) ((a)>(b)?(a):(b))

This works, provided a,b have no side effects. Indeed,

 
 i = 2;
 j = 3;
 k = MAX(i++, j++);

would result in k=4, i=3 and j=5. This would be highly surprising to anyone expecting MAX() to behave like a function.

So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this

inline max(int a, int b) { return a>b?a:b }

has none of the pitfalls above, but will not work with all types. A template (see below) takes care of this

template<typename T> inline max(const T& a, const T& b) { return a>b?a:b }

Indeed, this is (a variation of) the definition used in STL library for std::max(). This library is included with all conforming C++ compilers, so the ideal solution would be to use this.

std::max(3,4);

String Literal Concatenation

One minor function of the preprocessor is in joining strings together, "string literal concatenation" -- turning code like

 std::cout << "Hello " "World!\n";

into

 std::cout << "Hello World!\n";

Apart from obscure uses, this is most often useful when writing long messages, as it's not legal in C++ (at this time) to have a string literal which spans multiple lines in your source code (i.e., one which has a newline character inside it). It also helps to keep program lines down to a reasonable length; we can write

 function_name("This is a very long string literal, which would not fit "
               "onto a single line very nicely -- but with string literal "
               "concatenation, we can split it across multiple lines and "
               "the preprocessor will glue the pieces together");

Note that this joining happens before compilation; the compiler sees only one string literal here, and there's no work done at runtime, i.e., your program won't run any slower at all because of this joining together of strings.

Concatenation also applies to wide string literals (which are prefixed by an L):

 L"this " L"and " L"that"

is converted by the preprocessor into

 L"this and that".

NOTE:
For completeness, note that C99 has different rules for this than C++98, and that C++0x seems almost certain to match C99's more tolerant rules, which allow joining of a narrow string literal to a wide string literal, something which was not valid in C++98.

Conditional compilation

Conditional compilation is useful for two main purposes:

  • To allow certain functionality to be enabled/disabled when compiling a program
  • To allow functionality to be implemented in different ways, such as when compiling on different platforms

It is also used sometimes to temporarily "comment-out" code, though using a version control system is often a more effective way to do so.

  • Syntax:
#if condition
  statement(s)
#elif condition2
  statement(s)
...
#elif conditionN
  statement(s)
#else
  statement(s)
#endif

#ifdef defined-value
  statement(s)
#else
  statement(s)
#endif

#ifndef defined-value
  statement(s)
#else
  statement(s)
#endif
#if

The #if directive allows compile-time conditional checking of preprocessor values such as created with #define. If condition is non-zero the preprocessor will include all statement(s) up to the #else, #elif or #endif directive in the output for processing. Otherwise if the #if condition was false, any #elif directives will be checked in order and the first condition which is true will have its statement(s) included in the output. Finally if the condition of the #if directive and any present #elif directives are all false the statement(s) of the #else directive will be included in the output if present; otherwise, nothing gets included.

The expression used after #if can include boolean and integral constants and arithmetic operations as well as macro names. The allowable expressions are a subset of the full range of C++ expressions (with one exception), but are sufficient for many purposes. The one extra operator available to #if is the defined operator, which can be used to test whether a macro of a given name is currently defined.

#ifdef and #ifndef

The #ifdef and #ifndef directives are short forms of '#if defined(defined-value)' and '#if !defined(defined-value)' respectively. defined(identifier) is valid in any expression evaluated by the preprocessor, and returns true (in this context, equivalent to 1) if a preprocessor variable by the name identifier was defined with #define and false (in this context, equivalent to 0) otherwise. In fact, the parentheses are optional, and it is also valid to write defined identifier without them.

(Possibly the most common use of #ifndef is in creating "include guards" for header files, to ensure that the header files can safely be included multiple times. This is explained in the section on header files.)

#endif

The #endif directive ends #if, #ifdef, #ifndef, #elif and else directives.

  • Example:
#if defined(__BSD__) || defined(__LINUX__)
#include <unistd.h>
#endif

This can be used for example to provide multiple platform support or to have one common source file set for different program versions. Another example of use is using this instead of the (non-standard) #pragma once.

  • Example:

foo.hpp:

#ifndef FOO_HPP
# define FOO_HPP

 // code here...

#endif // FOO_HPP

bar.hpp:

#include "foo.h"

 // code here...

foo.cpp:

#include "foo.hpp"
#include "bar.hpp"

 // code here

When we compile foo.cpp, only one copy of foo.hpp will be included due to the use of include guard. When the preprocessor reads the line #include "foo.hpp", the content of foo.hpp will be expanded. Since this is the first time which foo.hpp is read (and assuming that there is no existing declaration of macro FOO_HPP) FOO_HPP will not yet be declared, and so the code will be included normally. When the preprocessor read the line #include "bar.hpp" in foo.cpp, the content of bar.hpp will be expanded as usual, and the file foo.h will be expanded again. Owing to the previous declaration of FOO_HPP, no code in foo.hpp will be inserted. Therefore, this can achieve our goal - avoiding the content of the file being included more than one time.

Compile-time warnings and errors

  • Syntax:
#warning message
#error message
#error and #warning

The #error directive causes the compiler to stop and spit out the line number and a message given when it is encountered. The #warning directive causes the compiler to spit out a warning with the line number and a message given when it is encountered. These directives are mostly used for debugging.

NOTE:
#error is part of Standard C++, whereas #warning is not (though it is widely supported).

  • Example:
#if defined(__BSD___)
#warning Support for BSD is new and may not be stable yet
#endif

#if defined(__WIN95__)
#error Windows 95 is not supported
#endif

Source file names and line numbering Macros

The current filename and line number where the preprocessing is being performed can be retrieved using the predefined macros __FILE__ and __LINE__. Line numbers are measured before any escaped newlines are removed. The current values of __FILE__ and __LINE__ can be overridden using the #line directive; it is very rarely appropriate to do this in hand-written code, but can be useful for code generators which create C++ code base on other input files, so that (for example) error messages will refer back to the original input files rather than to the generated C++ code.

Linker

The linker is a program that is responsible for linking and resolving linkage issues, such as the use of symbols or identifiers which are defined in one translation unit and are needed from other translation units, this information is created by the compiler. Symbols or identifiers which are needed outside a single translation unit must have external linkage, in short, the linker's job is to resolve references to undefined symbols by finding out which other object defines a symbol in question, and replacing placeholders with the symbol's address. Of course, the process is more complicated than this; but the basic ideas apply.

Linkers can take objects from a collection called a library. Depending on the library (system or language or external libraries) and options passed, they may only include its symbols that are referenced from other object files or libraries. Libraries for diverse purposes exist, and one or more system libraries are usually linked in by default. We will take a closer look into libraries on the Libraries Section of this book.

Linking

The process of connecting or combining object files produced by a compiler with the libraries necessary to make a working executable program (or a library) is called linking. Linkage refers to the way in which a program is built out of a number of translation units.

C++ programs can be compiled and linked with programs written in other languages, such as C, Fortran, and Pascal. When programs have two or more source programs written in different languages, you should do the following:

  • Compile each program module separately with the appropriate compiler.
  • Link them together in a separate step.
Static Linkage
TODO

TODO
Complete, use global const case as example

Internal Linkage
TODO

TODO
Complete

External Linkage
TODO

TODO
Complete

Operators

Operators are special symbols that are used to represent simple computations, this is significative importance in programming, since it serves to define the interaction of data in a useful way.

Computers are mathematical devices, but compilers and interpreters require a full syntactic theory of all operations in order to parse formulas involving any combinations correctly. In particular they depend on operator precedence rules, on order of operations, that are tacitly assumed in mathematical writing and the same applies to programming languages. Conventionally, the computing usage of operator also goes beyond the mathematical usage (for functions).

C++ like all programming languages uses a set of operators, they are subdivided into several groups:

  • arithmetic operators (like addition and multiplication).
  • boolean operators.
  • string operators (used to manipulate strings of text).
  • pointer operators.
  • named operators (operators such as sizeof, new, and delete defined by alphanumeric names rather than a punctuation character).

Most of the operators in C++ do exactly what you would expect them to do, because they are common mathematical symbols. For example, the operator for adding two integers is +. C++ allows the re-definition of some operators (operator overloading) and this be covered later on.

The following are all legal expressions whose meaning is more or less obvious:

  • 1+1
  • hour-1
  • hour*60 + minute
  • minute/60

Take this line:

sum = a + b;

it uses the + operator to add the values stored in the locations a and b and the assignment operator (=) to store the result in the location sum. a and b are said to be the operands of +. The combination a + b is called an expression, specifically an arithmetic expression since + is an arithmetic operator. Similarly, = and its operands, sum and a + b together form the assignment expression sum = a + b (Note that the semicolon is not part of the expression). Other arithmetic operations that can be performed on integers (also common in many other languages) include:

  • Subtraction, using the - operator
  • Multiplication, using the * operator
  • Division, using the / operator
  • Remainder, using the % operator

Expressions can contain both variables names and integer values. In each case the name of the variable is replaced with its value before the computation is performed.

Addition, subtraction and multiplication all do what you expect, but you might be surprised by division. For example, the following program:

int hour, minute; 
hour = 11; 
minute = 59; 
std::cout << "Number of minutes since midnight: "; 
std::cout << hour*60 + minute << std::endl; 
std::cout << "Fraction of the hour that has passed: "; 
std::cout << minute/60 << std::endl;

would generate the following output:

Number of minutes since midnight: 719
Fraction of the hour that has passed: 0

The first line is what we expected, but the second line is odd. The value of the variable minute is 59, and 59 divided by 60 is 0.98333, not 0. The reason for the discrepancy is that C++ is performing integer division.

When both of the operands are integers (operands are the things operators operate on), the result must also be an integer, and by definition integer division always rounds down, even in cases like this where the next integer is so close.

A possible alternative in this case is to calculate a percentage rather than a fraction:

std::cout << "Percentage of the hour that has passed: "; 
std::cout << minute*100/60 << std::endl;

The result is:

Percentage of the hour that has passed: 98

Again the result is rounded down, but at least now the answer is approximately correct. In order to get an even more accurate answer, we could use a different type of variable, called floating-point, that is capable of storing fractional values.

Table of Operators

Operators in the same group have the same precedence and the order of evaluation is decided by the associativity (left-to-right or right-to-left). Operators in a preceding group have higher precedence than those in a subsequent group.

NOTE:
Binding of operators actually cannot be completely described by "precedence" rules, and as such this table is an approximation. Correct understanding of the rules requires an understanding of the grammar of expressions.

Operators Description Example Usage Associativity
Scope Resolution Operator
:: unary scope resolution operator
for globals
::NUM_ELEMENTS
:: binary scope resolution operator
for class and namespace members
std::cout

Function Call, Member Access, Post-Increment/Decrement Operators, RTTI and C++ Casts Left to right
() function call operator swap (x, y)
[] array index operator arr [i]
. member access operator
for an object of class/union type
or a reference to it
obj.member
-> member access operator
for a pointer to an object of
class/union type
ptr->member
++ -- post-increment/decrement operators num++
typeid() run time type identification operator
for an object or type
typeid (std::cout)
typeid (std::iostream)
static_cast<>()
dynamic_cast<>()
const_cast<>()
reinterpret_cast<>()
C++ style cast operators
for compile-time type conversion
See Type Casting for more info
static_cast<float> (i)
dynamic_cast<std::istream> (stream)
const_cast<char*> ("Hello, World!")
reinterpret_cast<const long*> ("C++")
type() functional cast operator
(static_cast is preferred
for conversion to a primitive type)
float (i)
also used as a constructor call
for creating a temporary object, esp.
of a class type
std::string ("Hello, world!", 0, 5)

Unary Operators Right to left
!, not logical not operator !eof_reached
~, compl bitwise not operator ~mask
+ - unary plus/minus operators -num
++ -- pre-increment/decrement operators ++num
&, bitand address-of operator &data
* indirection operator *ptr
new
new[]
new()
new()[]
new operators
for single objects or arrays
new std::string (5, '*')
new int [100]
new (raw_mem) int
new (arg1, arg2) int [100]
delete
delete[]
delete operator
for pointers to single objects or arrays
delete ptr
delete[] arr
sizeof
sizeof()
sizeof operator
for expressions or types
sizeof 123
sizeof (int)
(type) C-style cast operator (deprecated) (float)i

Member Pointer Operators Right to left
.* member pointer access operator
for an object of class/union type
or a reference to it
obj.*memptr
->* member pointer access operator
for a pointer to an object of
class/union type
ptr->*memptr

Multiplicative Operators Left to right
* / % multiplication, division and
modulus operators
celsius_diff * 9 / 5

Additive Operators Left to right
+ - addition and subtraction operators end - start + 1

Bitwise Shift Operators Left to right
<<
>>
left and right shift operators bits << shift_len
bits >> shift_len

Relational Inequality Operators Left to right
< > <= >= less-than, greater-than, less-than or
equal-to, greater-than or equal-to
i < num_elements

Relational Equality Operators Left to right
== !=, not_eq equal-to, not-equal-to choice != 'n'

Bitwise And Operator Left to right
&, bitand bits & clear_mask_complement

Bitwise Xor Operator Left to right
^, xor bits ^ invert_mask

Bitwise Or Operator Left to right
|, bitor bits | set_mask

Logical And Operator Left to right
&&, and arr != 0 && arr->len != 0

Logical Or Operator Left to right
||, or arr == 0 || arr->len == 0

Conditional Operator Right to left
?: size >= 0 ? size : 0

Assignment Operators Right to left
= assignment operator i = 0
+= -= *= /=
%= !=, not_eq &=, and_eq
|=, or_eq
^=, xor_eq <<= >>=
shorthand assignment operators
(foo op= bar represents
foo = foo op bar)
num /= 10

Exceptions
throw throw "Array index out of bounds"

Comma Operator Left to right
, i = 0, j = i + 1, k = 0

Order of operations

When more than one operator appears in an expression the order of evaluation depends on the rules of precedence. A complete explanation of precedence can get complicated, but just to get you started:

Multiplication and division happen before addition and subtraction. So 2*3-1 yields 5, not 4, and 2/3-1 yields -1, not 1 (remember that in integer division 2/3 is 0). If the operators have the same precedence they are evaluated from left to right. So in the expression minute*100/60, the multiplication happens first, yielding 5900/60, which in turn yields 98. If the operations had gone from right to left, the result would be 59*1 which is 59, which is wrong. Any time you want to override the rules of precedence (or you are not sure what they are) you can use parentheses. Expressions in parentheses are evaluated first, so 2 * (3-1) is 4. You can also use parentheses to make an expression easier to read, as in (minute * 100) / 60, even though it doesn't change the result.

Chaining Insertion Operators 
std::cout << "The sum of " << a << " and " << b << " is " << sum << "\n";

The line illustrates what is called chaining of insertion operators to print multiple expressions. How this works is as follows:

  1. The leftmost insertion operator takes as its operands, std::cout and the string "The sum of ", it prints the latter using the former, and returns a reference to the former.
  2. Now std::cout << a is evaluated. This prints the value contained in the location a, i.e. 123 and again returns std::cout.
  3. This process continues. Thus, successively the expressions std::cout << " and ", std::cout << b, std::cout << " is ", std::cout << " sum ", std::cout << "\n" are evaluated and the whole series of chained values is printed.

Precedence (Composition)

At this point we have looked at some of the elements of a programming language like variables, expressions, and statements in isolation, without talking about how to combine them.

One of the most useful features of programming languages is their ability to take small building blocks and compose them (solving big problems by taking small steps at a time). For example, we know how to multiply integers and we know how to output values; it turns out we can do both at the same time:

std::cout << 17 * 3;

Actually, I shouldn't say "at the same time," since in reality the multiplication has to happen before the output, but the point is that any expression, involving numbers, characters, and variables, can be used inside an output statement. We've already seen one example:

std::cout << hour * 60 + minute << std::endl;

You can also put arbitrary expressions on the right-hand side of an assignment statement:

int percentage; 
percentage = ( minute * 100 ) / 60;

This ability may not seem so impressive now, but we will see other examples where composition makes it possible to express complex computations neatly and concisely.

NOTE:

There are limits on where you can use certain expressions; most notably, the left-hand side of an assignment statement (normally) has to be a variable name, not an expression. That's because the left side indicates the storage location where the result will go. Expressions do not represent storage locations, only values.
The following is illegal:
 minute+1 = hour;
(The exact rule for what can go on the left-hand side of an assignment expression is not so simple as it was in C; operator overloading and reference types complicate the picture.)

Chaining

std::cout << "The sum of " << a << " and " << b << " is " << sum << "\n";

The above line illustrates what is called chaining of insertion operators to print multiple expressions. How this works is as follows:

  1. The leftmost insertion operator takes as its operands, std::cout and the string "The sum of ", it prints the latter using the former, and returns a reference to the former.
  2. Now std::cout << a is evaluated. This prints the value contained in the location a, i.e. 123 and again returns std::cout.
  3. This process continues. Thus, successively the expressions std::cout << " and ", std::cout << b, std::cout << " is ", std::cout << " sum ", std::cout << "\n" are evaluated and the whole series of chained values is printed.

Assignment

The most basic assignment operator is the "=" operator. It assigns one variable to have the value of another. For instance, the statement x = 3 assigns x the value of 3, and y = x assigns whatever was in x to be in y. When the "=" operator is used to assign a class or struct, it acts like using the "=" operator on every single element. For instance:

//Example to demonstrate default "=" operator behavior.
 
struct A
 {
  int i;
  float f;
  A * next_a;
 };
 
//Inside some function
 {
  A a1, a2;              // Create two A objects.
 
  a1.i = 3;              // Assign 3 to i of a1.
  a1.f = 4.5;            // Assign the value of 4.5 to f in a1
  a1.next_a = &a2;       // a1.next_a now points to a2
 
  a2.next_a = NULL;      // a2.next_a is guaranteed to point at nothing now.
  a2.i = a1.i;           // Copy over a1.i, so that a2.i is now 3.
  a1.next_a = a2.next_a; // Now a1.next_a is NULL
 
  a2 = a1;               // Copy a2 to a1, so that now a2.f is 4.5. The other two are unchanged, since they were the same.
 }

Assignments can also be chained since the assignment operator returns the value it assigns. But this time the chaining is from right to left. For example, to assign the value of z to y and assign the same value (which is returned by the = operator) to x you use:

x = y = z;

When the "=" operator is used in a declaration, it has special meaning. It tells the compiler to directly initialize the variable from whatever is on the right-hand side of the operator. This is called defining a variable, in the same way that you define a class or a function. With classes, this can make a difference, especially when assigning to a function call:

class A { /* ... */ };
A foo () { /* ... */ };
 
// In some function
 {
  A a;
  a = foo();
 
  A a2 = foo();
 }

In the first case, a is constructed, then is changed by the "=" operator. In the second statement, a2 is constructed directly from the return value of foo(). In many cases, the compiler can save a lot of time by constructing foo()'s return value directly into a2's memory, which makes the program run faster.

Whether or not you define can also matter in a few cases where a definition can result in different linkage, making the variable more or less available to other source files.

Arithmetic Operators

sum = a + b;

The line above uses the + operator to add the values stored in the locations a and b and the assignment operator (=) to store the result in the location sum. a and b are said to be the operands of +. The combination a + b is called an expression, specifically an arithmetic expression since + is an arithmetic operator. Similarly, = and its operands, sum and a + b together form the assignment expression sum = a + b (Note that the semicolon is not part of the expression). Other arithmetic operations that can be performed on integers (also common in many other languages) include:

  • Subtraction, using the - operator
  • Multiplication, using the * operator
  • Division, using the / operator
  • Remainder, using the % operator

The multiplicative operators *, / and % are always evaluated before the additive operators + and -. Among operators of the same class, evaluation proceeds from left to right. This order can be overridden using grouping by parentheses, ( and ); the expression contained within parentheses is evaluated before any other neighboring operator is evaluated. But note that some compilers may not strictly follow these rules when they try to optimize the code being generated, unless violating the rules would give a different answer.

For example the following statements convert a temperature expressed in degrees Celsius to degrees Fahrenheit and vice versa:

deg_f = deg_c * 9 / 5 + 32;
deg_c = ( deg_f - 32 ) * 5 / 9;

Compound Assignment

One of the most common patterns in software with regards to operators is to update a value:

a = a + 1;
b = b * 2;
c = c / 4;

Since this pattern is used many times, there is a shorthand for it called compound assignment operators. They are a combination of an existing arithmetic operator and assignment operator:

  • +=
  • -=
  • *=
  • /=
  •  %=
  • <<=
  • >>=
  • |=
  • &=
  • ^=

Thus the example given in the beginning of the section could be rewritten as

a += 1;  // Equivalent to (a = a + 1)
b *= 2;  // Equivalent to (b = b * 2)
c /= 4;  // Equivalent to (c = c / 4)
TODO

TODO
C++ Programming

Character Operators

Interestingly, the same mathematical operations that work on integers also work on characters.

char letter; 
letter = 'a' + 1; 
std::cout << letter << std::endl;

For the above example, outputs the letter b (on most systems -- note that C++ doesn't assume use of ASCII, EBCDIC, Unicode etc. but rather allows for all of these and other charsets). Although it is syntactically legal to multiply characters, it is almost never useful to do it.

Earlier I said that you can only assign integer values to integer variables and character values to character variables, but that is not completely true. In some cases, C++ converts automatically between types. For example, the following is legal.

int number; 
number = 'a'; 
std::cout << number << std::endl;

On most mainstream desktop computers the result is 97, which is the number that is used internally by C++ on that system to represent the letter 'a'. However, it is generally a good idea to treat characters as characters, and integers as integers, and only convert from one to the other if there is a good reason. Unlike some other languages, C++ does not make strong assumptions about how the underlying platform represents characters; ASCII, EBCDIC and others are possible, and portable code will not make assumptions (except that '0', '1', ..., '9' are sequential, so that e.g. '9'-'0' == 9).

Automatic type conversion is an example of a common problem in designing a programming language, which is that there is a conflict between formalism, which is the requirement that formal languages should have simple rules with few exceptions, and convenience, which is the requirement that programming languages be easy to use in practice.

More often than not, convenience wins, which is usually good for expert programmers, who are spared from rigorous but unwieldy formalism, but bad for beginning programmers, who are often baffled by the complexity of the rules and the number of exceptions. In this book I have tried to simplify things by emphasizing the rules and omitting many of the exceptions.

Bitwise Operators

These operators deal with a bitwise operations. Bit operations needs the understanding of binary numeration since it will deal with on one or two bit patterns or binary numerals at the level of their individual bits. On most microprocessors, bitwise operations are sometimes slightly faster than addition and subtraction operations and usually significantly faster than multiplication and division operations.

Bitwise operations especially important for much low-level programming from optimizations to writing device drivers, low-level graphics, communications protocol packet assembly and decoding.

Although machines often have efficient built-in instructions for performing arithmetic and logical operations, in fact all these operations can be performed just by combining the bitwise operators and zero-testing in various ways.

The bitwise operators work bit by bit on the operands. The operands must be of integral type (one of the types used for integers).

For this section, recall that a number starting with 0x is hexadecimal (hexa, or hex for short or referred also as base-16). Unlike the normal decimal system using powers of 10 and the digits 0123456789, hex uses powers of 16 and the symbols 0123456789abcdef. In the examples remember that Oxc equals 1100 in binary and 12 in decimal. C++ does not directly support binary notation, which would hamper readability of the code.

NOT
~a  
bitwise complement of a.
~0xc produces the value -1-0xc (in binary, ~1100 produces ...11110011 where "..." may be many more 1 bits)

The negation operator is a unary operator which precedes the operand, This operator must not be confused with the "logical not" operator, "!" (exclamation point), which treats the entire value as a single Boolean—changing a true value to false, and vice versa. The "logical not" is not a bitwise operation.

These others are binary operators which lie between the two operands. The precedence of these operators is lower than that of the relational and equivalence operators; it is often required to parenthesize expressions involving bitwise operators.

AND
a & b 
bitwise boolean and of a and b
0xc & 0xa produces the value 0x8 (in binary, 1100 & 1010 produces 1000)
OR
a | b 
bitwise boolean or of a and b
0xc | 0xa produces the value 0xe (in binary, 1100 | 1010 produces 1110)
XOR
a ^ b 
bitwise xor of a and b
0xc ^ 0xa produces the value 0x6 (in binary, 1100 ^ 1010 produces 0110)
Bit shifts
a << b 
shift a left by b (multiply a by 2b)
0xc << 1 produces the value 0x18 (in binary, 1100 << 1 produces the value 11000)
a >> b 
shift a right by b (divide a by 2b)
0xc >> 1 produces the value 0x6 (in binary, 1100 >> 1 produces the value 110)

Derived Types Operators

There are three data types known as pointers, references, and arrays, that have their own operators for dealing with them. Those are *, &, [], ->, .*, and ->*.

Pointers, references, and arrays are fundamental data types that deal with accessing other variables. Pointers are used to pass around a variables address (where it is in memory), which can be used to have multiple ways to access a single variable. References are aliases to other objects, and are similar in use to pointers, but still very different. Arrays are large blocks of contiguous memory that can be used to store multiple objects of the same type, like a sequence of characters to make a string.

Subscript Operator "[]"

This operator is used to access an object of an array. It is also used when declaring array types, allocating them, or deallocating them.

Arrays

An array stores a constant-sized sequential set of blocks, each block containing a value of the elected type under a single name. Arrays often help organize collections of data efficiently and intuitively.

It is easiest to think of an array as simply a list with each value as an item of the list. Where individual elements are accessed by their position in the array called its index, also known as subscript. Each item in the array has an index from 0 to (the size of the array) -1, indicating its position in the array.

Advantages of arrays include:

  • Random access in O(1) (Big O notation)
  • Ease of use/port: Integrated into most modern languages

Disadvantages include:

  • Constant size
  • Constant data-type
  • Large free sequential block to accommodate large arrays
  • When used as non-static data members, the element type must allow default construction
  • Arrays do not support copy assignment (you cannot write arraya = arrayb)
  • Arrays cannot be used as the value type of a standard container
  • Syntax of use differs from standard containers
  • Arrays and inheritance don't mix (an array of Derived is not an array of Base, but can too easily be treated like one)

NOTE:
If complexity allows you should consider the use of containers (as in the C++ Standard Library). You should and can use for example std::vector which are as fast as arrays in most situations, can be dynamically resized, support iterators, and lets you treat the storage of the vector just like an array.

(Modern C allows VLAs, variable length arrays, but these are not used in C++, which already had a facility for re-sizable arrays in std::vector.)

The pointer operator as you will see is similar to the array operator.


For example, here is an array of integers, called List with 5 elements, numbered 0 to 4. Each element of the array is an integer. Like other integer variables, the elements of the array start out uninitialized. That means it is filled with unknown values until we initialize it by assigning something to it. (Remember primitive types in C are not initialized to 0.)

Index Data
00 unspecified
01 unspecified
02 unspecified
03 unspecified
04 unspecified

Since an array stores values, what type of values and how many values to store must be defined as part of an array declaration, so it can allocate the needed space. The size of array must be a const integral expression greater than zero. That means that you cannot use user input to declare an array. You need to allocate the memory (with operator new[]), so the size of an array has to be known at compile time. Another disadvantage of the sequential storage method is that there has to be a free sequential block large enough to hold the array. If you have an array of 500,000,000 blocks, each 1 byte long, you need to have roughly 500 megabytes of sequential space to be free; Sometimes this will require a defragmentation of the memory, which takes a long time.

To declare an array you can do:

int numbers[30]; // creates an array of 30 integers

or

char letters[4]; // create an array of 4 characters

and so on...

to initialize as you declare them you can use:

int vector[6]={0,0,1,0,0,0};

this will not only create the array with 6 int elements but also initialize them to the given values.

Assigning and accessing data

You can assign data to the array by using the name of the array, followed by the index.

For example to assign the number 200 into the element at index 2 in the array

 
List[2] = 200;

will give

Index Data
00 unspecified
01 unspecified
02 200
03 unspecified
04 unspecified

You can access the data at an element of the array the same way.

std::cout << List[2] << std::endl;

This will print 200.

Basically working with individual elements in an array is no different then working with normal variables.

As you see accessing a value stored in an array is easy. Take this other example:

int x;
x = vector[2];

The above declaration will assign x the valued store at index 2 of variable vector which is 1.

Arrays are indexed starting at 0, as opposed to starting at 1. The first element of the array above is vector[0]. The index to the last value in the array is the array size minus one. In the example above the subscripts run from 0 through 5. C++ does not do bounds checking on array accesses. The compiler will not complain about the following:

char y;
int z = 9;
char vector[6] = { 1, 2, 3, 4, 5, 6 };
 
// examples of accessing outside the array. A compile error is not raised
y = vector[15];
y = vector[-4];
y = vector[z];

During program execution, an out of bounds array access does not always cause a run time error. Your program may happily continue after retrieving a value from vector[-1]. To alleviate indexing problems, the sizeof() expression is commonly used when coding loops that process arrays.

int ix;
short anArray[]= { 3, 6, 9, 12, 15 };
 
for (ix=0; ix< (sizeof(anArray)/sizeof(short)); ++ix) {
  DoSomethingWith( anArray[ix] );
}

Notice in the above example, the size of the array was not explicitly specified. The compiler knows to size it at 5 because of the five values in the initializer list. Adding an additional value to the list will cause it to be sized to six, and because of the sizeof expression in the for loop, the code automatically adjusts to this change.

You can also use multi-dimensional arrays. The simplest type is a two dimensional array. This creates a rectangular array - each row has the same number of columns. To get a char array with 3 rows and 5 columns we write...

char two_d[3][5];

To access/modify a value in this array we need two subscripts:

char ch;
ch = two_d[2][4];

or

two_d[0][0] = 'x';

There are also weird notations possible:

int a[100];
int i = 0;
if (a[i]==i[a])
  printf("Hello World!\n");

a[i] and i[a] point to the same location. You will understand this better after knowing about pointers.

To get an array of a different size, you must explicitly deal with memory using realloc, malloc, memcpy, etc.

Why start at 0?

Most programming languages number arrays from 0. This is useful in languages where arrays are used interchangeably with a pointer to the first element of the array. In C++ the address of an element in the array can be computed from (address of first element) + i, where i is the index starting at 0 (a[1] == *(a + 1)). Notice here that "(address of the first element) + i" is not a literal addition of numbers. Different types of data have different sizes and the compiler will correctly take this into account. Therefore, it is simpler for the pointer arithmetic if the index started at 0.

Why no bounds checking on array indexes?

C++ does allow for, but doesn't force, bounds-checking implementations, in practice little or no checking is done. It affects storage requirements (needing "fat pointers") and impacts runtime performance. However, the std::vector template class as we will see is an object representing an array, and it provides the at() method, which does enforce bounds checking. Also in many implementations, the standard containers include particularly complete bounds checking in debug mode. They might not support these checks in release builds, as any performance reduction in container classes relative to built-in arrays might prevent programmers from migrating from arrays to the more modern, safer container classes.

address-of operator "&"

To get the address of a variable so that you can assign a pointer, you use the "address of" operator, which is denoted by the ampersand & symbol. The "address of" operator does exactly what it says, it returns the "address of" a variable, a symbolic constant, or a element in an array, in the form of a pointer of the corresponding type. To use the "address of" operator, you tack it on in front of the variable that you wish to have the address of returned. It is also used when declaring reference types.

Now, do not confuse the "address of" operator with the declaration of a reference. Because use of operators is restricted to expression, the compiler knows that &sometype is the "address of" operator being used to denote the return of the address of sometype as a pointer.

References

References are a way of assigning a "handle" to a variable. References can also be thought of as "aliases"; they're not real objects, they're just alternative names for other objects.

Assigning References
This is the less often used variety of references, but still worth noting as an introduction to the use of references in function arguments. Here we create a reference that looks and acts like a standard variable except that it operates on the same data as the variable that it references.
int tZoo = 3;       // tZoo == 3
int &refZoo = tZoo; // tZoo == 3
refZoo = 5;         // tZoo == 5

refZoo is a reference to tZoo. Changing the value of refZoo also changes the value of tZoo.

NOTE:
One use of variable references is to pass function arguments using references. This allows the function to update / change the data in the variable being referenced

For example say we want to have a function to swap 2 integers

void swap(int &a, int &b){
  int temp = a; 
  a = b; 
  b = temp;
}
int main(){
   int x = 5; 
   int y = 6; 
   int &refx = x; 
   int &refy = y; 
   swap(refx, refy); // now x = 6 and y = 5
   swap(x, y); // and now x = 5 and y = 6 again
}

References cannot be null as they refer to instantiated objects, while pointers can be null. References cannot be reassigned, while pointers can be.

int main(){
   int x = 5;
   int y = 6;
   int &refx = x;
   &refx = y; // won't compile
}

As references provide strong guarantees when compared with pointers, using references makes the code simpler. Therefore using references should usually be preferred over using pointers. Of course, pointers have to be used at the time of dynamic memory allocation (new) and deallocation (delete).

Pointers, Operator "*"

The "*" operator is used when declaring pointer types but it is also used to get the variable pointed to by a pointer.

Pointer a pointing variable b. Note that b stores number, whereas a stores address of b in memory (1462)

Pointers are important data types due to special characteristics. They may be used to indicate a variable without actually creating a variable of that type. They can be a difficult concept to understand, some special effort should be spent on understanding the power they give to programmers.

Pointers have a very descriptive name. Pointers variables only store memory addresses, usually the addresses of other variables. Essentially, they point to another variable memory location, a reserved location on the computer memory. You can use a pointer to pass the location of a variable to a function, this enables the function's pointer to use the variable space, so that it can retrieve or modify its data. You can even have pointers to pointers, and pointers to pointers to pointers and so on and so forth.

Declaring

Pointers are declared by adding a * before the variable name in the declaration, as in the following example:

int* x;  // pointer to int.
int * y; // pointer to int. (legal, but rarely used)
int *z;  // pointer to int.
int*i;   // pointer to int. (legal, but rarely used)

NOTE:
As always whitespace does not matter, so the position of the * doesn't matter only the order of the use.
Due to historical reasons some programmers refer to a specific use as:

// C codestyle
int *z;
 
// C++ codestyle
int* z;

As seen before check the coding style conventions used and adhere to a single use.

Watch out, though, because the * associates to the following declaration only:

int* i, j;  // CAUTION! i is pointer to int, j is int.
int *i, *j; // i and j are both pointer to int.

You can also have multiple pointers chained together, as in the following example:

int **i;  // Pointer to pointer to int.
int ***i; // Pointer to pointer to pointer to int (rarely used).
Assigning values
TODO

TODO
Missing info

Dereferencing

The "*" operator is used to get the variable pointed to by a pointer. It is also used when declaring pointer types.

TODO

TODO
Missing info

Null Pointer

The null pointer is a special status of pointers. It means that the pointer points to absolutely nothing. It is an error to attempt to dereference (using the * or -> operators) a null pointer. A null pointer can be referred to using the constant zero, as in the following example:

int i;
int *p;
 
p = 0; //Null pointer.
p = &i; //Not the null pointer.

Note that you can't assign a pointer to an integer, even if it's zero. It has to be the constant. The following code is an error:

int i = 0;
int *p = i; //Error: 0 only evaluates to null if it's a pointer

There is an old macro, defined in the standard library, derived from the C language that inconsistently has evolved into #define NULL ((void *)0), this makes NULL, always equal to a null pointer value (essentially, 0).

NOTE:
It is considered as good practice to avoid the use of macros and defines as much as possible. In the particular case at hand the NULL isn't type-safe. Any rational to use it for visibility of the use of a pointer can be addressed by the proper naming of the pointer variable.

Since a null pointer is 0, it will always compare to 0. Like an integer, if you use it in a true/false expression, it will return false if it is the null pointer, and true if it's anything else:

#include <iostream>
 
void IsNull (int * p)
{
  if (p)
    std::cout<<"Pointer is not NULL"<<std::endl;
  else
    std::cout<<"Pointer is NULL"<<std::endl;
}
 
int main()
{
  int * p;
  int i;
 
  p = NULL;
  IsNull(p);
  p = &i;
  IsNull(&i);
  IsNull(p);
  IsNull(NULL);
 
  return 0;
}

This program will output that the pointer is NULL, then that it isn't NULL twice, then again that it is.

TODO

TODO
Make short introduction to pointers as data members (so it can be cross linked from the function and class sections of the texts)

Pointers to Classes
Indirection Operator "->"

This pointer indirection operator is used to access a member of a class pointer.

Member Dereferencing Operator ".*"

This pointer-to-member dereferencing operator is used to access the variable associated with a specific class instance, given an appropriate pointer.

Member Indirection Operator "->*"

This pointer-to-member indirection operator is used to access the variable associated with a class instance pointed to by one pointer, given another pointer-to-member that's appropriate.

Pointers to functions

When used to point to functions, pointers can be exceptionally powerful. A call can be made to a function anywhere in the program, knowing only what kinds of parameters it takes. Pointers to functions are used several times in the standard library, and provide a powerful system for other libraries which need to adapt to any sort of user code. This case is examined more in depth in the Functions Section of this book.

Dereferencing

Now that you have a pointer, you need some way to access the memory that it points to. This is the * operator. When it's put in front of a pointer, it gives the variable pointed to. This is an lvalue, so you can assign values to it, or even initialize a reference from it.

#include <iostream>
 
int main()
{
  int i;
  int * p = &i;
  i = 3;
 
  std::cout<<*p<<std::endl; // prints "3"
 
  return 0;
}

Since the result of an & operator is a pointer, *&i is valid, though it has absolutely no effect.

Now, when you combine the * operator with classes, you may notice a problem. It has lower precedence than .! See the example:

struct A { int num; };
 
A a;
int i;
A * p;
 
p = &a;
a.num = 2;
 
i = *p.num; // Error! "p" isn't a class, so you can't use "."
i = (*p).num;

The error happens because the compiler looks at p.num first ("." has higher precedence than "*") and because p does not have a member named num the compiler gives you an error. Using grouping symbols to change the precedence gets around this problem.

It would be very time-consuming to have to write (*p).num a lot, especially when you have a lot of classes. Imagine writing (*(*(*(*MyPointer).Member).SubMember).Value).WhatIWant! As a result, a special operator, ->, exists. Instead of (*p).num, you can write p->num, which is completely identical for all purposes. Now you can write MyPointer->Member->SubMember->Value->WhatIWant. It's a lot easier on the brain!

sizeof()

The sizeof operator works at compile time to report on the number of bytes of storage occupied by a type (equivalently, by a variable of that type).

Syntactically, sizeof appears like a function call when taking the size of a type, but may be used without parentheses when taking the size of an object. Style guidelines vary on whether using the latitude to omit parentheses in the latter case is desirable.

sizeof has also found new life in recent years in template meta programming in C++, where the fact that it can turn types into numbers, albeit in a primitive manner, is often useful, given that the template metaprogramming environment of C++ typically does most of its calculations with types.

//Examples of sizeof use
std::size_t int_size( sizeof( int ) );// Might give 1, 2, 4, 8 or other values.
 
// or
 
int answer( 42 );
std::size_t answer_size( sizeof( answer ) );// Same value as sizeof( int )
std::size_t answer_size( sizeof answer);    // Equivalent syntax

Note that sizeof measures the size of an object in the simple sense of a contiguous area of storage; for types which include pointers to other storage, the indirect storage is not included in the number of bytes returned by sizeof. A common mistake made by programming newcomers working with C++ is to try to use sizeof to determine the length of a string; the std::strlen or std::string::length functions are more appropriate for that task.

Dynamic Memory Allocation

Dynamic memory allocation is the allocation of memory storage for use in a w:computer program during the runtime of that program. It is a way of distributing ownership of limited memory resources among many pieces of data and code. Importantly, the amount of memory allocated is determined by the program at the time of allocation and need not be known in advance. A dynamic allocation exists until it is explicitly released, either by the programmer or by a garbage collector implementation; this is notably different from automatic and static memory allocation, which require advance knowledge of the required amount of memory and have a fixed duration. It is said that an object so allocated has dynamic lifetime.

The task of fulfilling an allocation request, which involves finding a block of unused memory of sufficient size, is complicated by the need to avoid both internal and external fragmentation while keeping both allocation and deallocation efficient. Also, the allocator's metadata can inflate the size of (individually) small allocations; chunking attempts to reduce this effect.

Usually, memory is allocated from a large pool of unused memory area called the heap (also called the free store). Since the precise location of the allocation is not known in advance, the memory is accessed indirectly, usually via a reference. The precise algorithm used to organize the memory area and allocate and deallocate chunks is hidden behind an abstract interface and may use any of the methods described below.

You have probably wondered how programmers allocate memory efficiently without knowing, prior to running the program, how much memory will be necessary. Here is when the fun starts with dynamic memory allocation.

new and delete

For dynamic memory allocation we use the new and delete keywords, the old malloc from C functions can now be avoided but are still accessible for compatibility and low level control reasons.

TODO

TODO
add info on malloc

As covered before, we assign values to pointers using the "address of" operator because it returns the address in memory of the variable or constant in the form of a pointer. Now, the "address of" operator is NOT the only operator that you can use to assign a pointer. You have yet another operator that returns a pointer, which is the new operator. The new operator allows the programmer to allocate memory for a specific data type, struct, class, etc, and gives the programmer the address of that allocated sect of memory in the form of a pointer. The new operator is used as an rvalue, similar to the "address of" operator. Take a look at the code below to see how the new operator works.

By assigning the pointers to an allocated sector of memory, rather than having to use a variable declaration, you basically override the "middleman" (the variable declaration). Now, you can allocate memory dynamically without having to know the number of variables you should declare.

int n = 10; 
SOMETYPE *parray, *pS; 
int *pint; 
 
parray = new SOMETYPE[n]; 
pS = new SOMETYPE; 
pint = new int;

If you looked at the above piece of code, you can use the new operator to allocate memory for arrays too, which comes quite in handy when we need to manipulate the sizes of large arrays and or classes efficiently. The memory that your pointer points to because of the new operator can also be "deallocated," not destroyed but rather, freed up from your pointer. The delete operator is used in front of a pointer and frees up the address in memory to which the pointer is pointing.

delete [] parray;// note the use of [] when destroying an array allocated with new
delete pint;

The memory pointed to by parray and pint have been freed up, which is a very good thing because when you're manipulating multiple large arrays, you try to avoid losing the memory someplace by leaking it. Any allocation of memory needs to be properly deallocated or a leak will occur and your program won't run efficiently. Essentially, every time you use the new operator on something, you should use the delete operator to free that memory before exiting. The delete operator, however, not only can be used to delete a pointer allocated with the new operator, but can also be used to "delete" a null pointer, which prevents attempts to delete non-allocated memory (this actions compiles and does nothing).

You must keep in mind that new T and new T() are not equivalent. This will be more understandable after you are introduced to more complex types like classes, but keep in mind that when using new T() it will initialize the T memory location ("zero out") before calling the constructor (if you have non-initialized members variables, they will be initialized by default).

The new and delete operators do not have to be used in conjunction with each other within the same function or block of code. It is proper and often advised to write functions that allocate memory and other functions that deallocate memory. Indeed, the currently favored style is to release resources in object's destructors, using the so-called resource acquisition is initialization (RAII) idiom.

TODO

TODO
Move or split some of the information or add references, classes, destructor and constructors have yet to be introduced and bellow we are using a vector for the example

As we will see when we get to the Classes, a class destructor is the ideal location for its deallocator, it is often advisable to leave memory allocators out of classes' constructors. Specifically, using new to create an array of objects, each of which also uses new to allocate memory during its construction, often results in runtime errors. If a class or structure contains members which must be pointed at dynamically-created objects, it is best to sequentially initialize arrays of the parent object, rather than leaving the task to their constructors.

NOTE:
If possible you should use new and delete instead of malloc and free.

// Example of a dynamic array
 
const int b = 5;
int *a = new int[b];
 
//to delete
delete[] a;

The ideal way is to not use arrays at all, but rather the STL's vector type (a container similar to an array). To achieve the above functionality, you should do:

const int b = 5;
std::vector<int> a;
a.resize(b);
 
//to delete
a.clear();

Vectors allow for easy insertions even when "full." If, for example, you filled up a, you could easily make room for a 6th element like so:

int new_number = 99;
a.push_back( new_number );//expands the vector to fit the 6th element

You can similarly dynamically allocate a rectangular multidimensional array (be careful about the type syntax for the pointers):

const int d = 5;
int (*two_d_array)[4] = new int[d][4];
 
//to delete
delete[] two_d_array;

You can also emulate a ragged multidimensional array (sub-arrays not the same size) by allocating an array of pointers, and then allocating an array for each of the pointers. This involves a loop.

const int d1 = 5, d2 = 4;
int **two_d_array = new int*[d1];
for( int i = 0; i < d1; ++i)
  two_d_array[i] = new int[d2];
 
//to delete
for( int i = 0; i < d1; ++i)
  delete[] two_d_array[i];
 
delete[] two_d_array;


TODO

TODO

  • Add missing information on Bitwise and Relational operators.

Logical operators

The operators and (can also be written as &&) and or (can also be written as ||) allow two or more conditions to be chained together. The and operator checks whether all conditions are true and the or operator checks whether at least one of the conditions is true. Both operators can also be mixed together in which case the order in which they appear from left to right, determines how the checks are performed. Older versions of the C++ standard used the keywords && and || in place of and and or. Both operators are said to short circuit. If a previous and condition is false, later conditions are not checked. If a previous or condition is true later conditions are not checked.

NOTE:
The iso646.h header file is part of the C standard library, since 1995, as an amendment to the C90 standard. It defines a number of macros which allow programmers to use C language bitwise and logical operators in textual form, which, without the header file, cannot be quickly or easily typed on some international and non-QWERTY keyboards. These symbols are keywords in the ISO C++ programming language and do not require the inclusion of a header file. For consistency, however, the C++98 standard provides the header <ciso646>. On MS Visual Studio that historically implements nonstandard language extensions this is the only way to enable these keywords (via macros) without disabling the extensions.

The not (can also be written as !) operator is used to return the inverse of one or more conditions.

  • Syntax:
condition1 and condition2
condition1 or condition2
not condition
  • Semantic:

if both condition1 and conditions2 are true the result is true else the result is false.

condition1 condition2 condition1 and condition2
true true true
true false false
false true false
false false false

if condition1 or condition2 is true the result is true else the result is false.

condition1 condition2 condition1 or condition2
true true true
true false true
false true true
false false false

if condition is true the result is false and if the condition is false the result is true.

condition not condition
true false
false true
  • Examples:


When something should not be true. It is often combined with other conditions. If x>5 but not x = 10, it would be written:

if ((x > 5) and not (x == 10)) // if (x greater than 5) and ( not (x equal to 10) ) 
{
  //...code...
}

When all conditions must be true. If x must be between 10 and 20:

if (x > 10 and x < 20) // if x greater than 10 and x less than 20
{
  //....code...
}

When at least one of the conditions must be true. If x must be equal to 5 or equal to 10 or less than 2:

if (x == 5 or x == 10 or x < 2) // if x equal to 5 or x equal to 10 or x less than 2
{
  //...code...
}

When at least one of a group of conditions must be true. If x must be between 10 and 20 or between 30 and 40.

if ((x >= 10 and x <= 20) or (x >= 30 and x <= 40)) // >= -> greater or equal etc...
{
  //...code...
}

Things get a bit more tricky with more conditions. The trick is to make sure the parenthesis are in the right places to establish the order of thinking intended. However, when things get this complex, it can often be easier to split up the logic into nested if statements, or put them into bool variables, but it is still useful to be able to do things in complex boolean logic.

Parenthesis around x > 10 and around x < 20 are implied, as the < operator has a higher precedence than and. First x is compared to 10. If x is greater than 10, x is compared to 20, and if x is also less than 20, the code is executed.


AND Operator

Logical AND:

AND True False
True
True
False
False
False
False

The logical AND operator, and, compares the left value and the right value. If both statement1 and statement2 are true, then the expression returns TRUE. Otherwise, it returns FALSE.

if ((var1 > var2) and (var2 > var3))
{
  std::cout << var1 " is bigger than " << var2 << " and " << var3 << std::endl;
}

In this snippet, the if statement checks to see if var1 is greater than var2. Then, it checks if var2 is greater than var3. If it is, it proceeds by telling us that var1 is bigger than both var2 and var3.

NOTE:
The logical AND operator and is sometimes written as &&, which is not the same as the address operator and the bitwise AND operator, both of which are represented with &

OR Operator

Logical OR:

OR True False
True
True
True
False
True
False

The logical OR operator is represented with or. Like the logical AND operator, it compares statement1 and statement2. If either statement1 or statement2 are true, then the expression is true. The expression is also true if both of the statements are true.

if ((var1 > var2) or (var1 > var3))
{
  std::cout << var1 " is either bigger than " << var2 << " or " << var3 << std::endl;
}

Let's take a look at the previous expression with an OR operator. If var1 is bigger than either var2 or var3 or both of them, the statements in the if expression are executed. Otherwise, the program proceeds with the rest of the code.

NOT Operator

The logical NOT operator, not, returns TRUE if the statement being compared is not true. Be careful when you're using the NOT operator, as well as any logical operator.

not x > 10

The logical expressions have a higher precedence than normal operators. Therefore, it compares whether "not x" is greater than 10. However, this statement always returns false, no matter what "x" is. That's because the logical expressions only return boolean values(1 and 0).

Conditional Operator

Conditional operators (also known as ternary operators) allow a programmer to check: if (x is more than 10 and eggs is less than 20 and x is not equal to a...).

Most operators compare two variables; the one to the left, and the one to the right. However, C++ also has a ternary operator (sometimes known as the conditional operator), ?: which chooses from two expressions based on the value of a condition expression. The basic syntax is:

 condition-expression ? expression-if-true : expression-if-false

If condition-expression is true, the expression returns the value of expression-if-true. Otherwise, it returns the value of expression-if-false. Because of this, the ternary operator can often be used in place of the if expression.

  • For example:
int foo = 8;
std::cout << "foo is " << (foo < 10 ? "smaller than" : "greater than or equal to") << " 10." << std::endl;

The output will be "foo is smaller than 10.".

TODO

TODO
Note the short-cut semantics of evaluation. Note the conditions on the types of the expressions, and the conversions that will be applied if they have different types. Note that code that discards the value of the conditional expression can be more clearly written using an if statement.

Type Checking

Type checking is the process of verifying and enforcing the constraints of types, which can occur at either compile-time or run-time. Compile time checking, also called static type checking, is carried out by the compiler when a program is compiled. Run time checking, also called dynamic type checking, is carried out by the program as it is running. A programming language is said to be strongly typed if the type system ensures that conversions between types must be either valid or result in an error. A weakly typed language on the other hand makes no such guarantees and generally allows automatic conversions between types which may have no useful purpose. C++ falls somewhere in the middle, allowing a mix of automatic type conversion and programmer defined conversions, allowing for almost complete flexibility in interpreting one type as being of another type. Converting variables or expression of one type into another type is called type casting.

Type Conversion

Type conversion (often a result of type casting) refers to changing an entity of one data type into another. This is done to take advantage of certain features of type hierarchies. For instance, values from a more limited set, such as integers, can be stored in a more compact format and later converted to a different format enabling operations not previously possible, such as division with several decimal places' worth of accuracy. In the object-oriented programming paradigm, type conversion allows programs also to treat objects of one type as one of another. One must do it carefully as type casting can lead to loss of data.

NOTE:
The Wikipedia article about strongly typed suggests that there isn't enough consensus on the term "strongly typed" to use it safely. So you should re-check the intended meaning carefully, the above statement is what C++ programmers refer as strongly typed in the language scope.

Automatic Type conversion

Automatic Type conversion happens whenever the compiler expects data of a particular type, but the data is given as a different type, leading to an automatic conversion by the compiler, if the conversion is impossible it will result in an error at compile time or a warning in case of undefined behavior (ie: converting an int to a char). Warnings may vary depending on the compiler used or compiler options.

NOTE:
This is not "casting" or explicit type conversions. There is no such thing as an "automatic cast".

int a = 5.6;
float b = 7;

In the example above, in the first case an expression of type float is given and automatically interpreted as an integer. In the second case (more subtle), an integer is given and automatically interpreted as a float.

There are two types of automatic type conversions between numeric types: promotion and demotion. (Note: the term "demotion" is not normally used in the C++ community.)

TODO

TODO
There are other implicit conversions also, such as derived-to-base conversions for pointers/references to objects of class types, and the implicit "decay" from an array to a corresponding pointer type.

Promotion

Promotion occurs whenever a variable or expression of a smaller type is converted to a larger type.

// promoting float to double.....
 
float a = 4;    // 4 is an int constant, gets promoted to float
long b = 7;     // 7 is an int constant, gets promoted to long
double c = a;   // a is a float, gets promoted to double

There is generally no problem with automatic promotion. Programmers should just be aware that it happens.

Demotion

Demotion occurs whenever a variable or expression of a larger type gets converted to a smaller type. By default, a floating point number is considered as a double number in C++.

// Demotion of float to char.....
 
int a = 7.5;   // double gets down-converted to int;
int b = 7.0f;  // float gets down-converted to int;
char c = b;    // int gets down-converted to char;

Automatic demotion can result in the loss of information. In the first example the variable a will contain the value 7, since int variables cannot handle floating point values.

Most modern compilers will generate a warning if demotion occurs. Should the loss of information be intended, the programmer may do explicit type casting to suppress the warning; bit masking may be a superior alternative.

Explicit type conversion (casting)

Explicit type conversion (casting) is the use of specific notation in the source code to request a conversion or to specify a function from an overload set. There are cases where no automatic type conversion can occur or where the compiler is unsure about what type to convert to, that requires explicit instructions from the programmer.

The basic form of type cast

The basic explicit form of typecasting is the static cast.

A static cast looks like this:

static_cast<target type>(expression)

The compiler will try its best to interpret the expression as if it would be of type type. This type of cast will not produce a warning, even if the type is demoted.

int a = static_cast<int>(7.5);

The cast can be used to suppress the warning as shown above. static_cast cannot do all conversions; for example, it cannot remove const qualifiers, and it cannot perform "cross-casts" within a class hierarchy. It can be used to perform most numeric conversions, including conversion from a integral value to an enumerated type.

Advanced type casts

const_cast

const_cast<T>(expression)
The const_cast<>() is used to add/remove const(ness) (or volatile-ness) of a variable.

static_cast

static_cast<T>(expression)
The static_cast<>() is used to cast between numeric types.
'e.g.' char->long, int->short, double->int, etc.

Static cast is also used to cast pointers to related types. For example, it can cast void* to the appropriate pointer type or vice-versa.

dynamic_cast

Dynamic cast is used to convert pointers and references at run-time, generally for the purpose of casting a pointer or reference up or down an inheritance chain (inheritance hierarchy).

dynamic_cast<target type>(expression)

The target type must be a pointer or reference type, and the expression must evaluate to a pointer or reference. Dynamic cast works only when the type of object to which the expression refers is compatible with the target type and the base class has at least one virtual member function. If not, and the type of expression being cast is a pointer, NULL is returned. If a dynamic cast on a reference fails, a bad_cast exception is thrown. When it doesn't fail, dynamic cast returns a pointer or reference of the target type to the object to which expression referred.

reinterpret_cast

Reinterpret cast simply casts one type bitwise to another. Any pointer or integral type can be casted to any other with reinterpret cast, easily allowing for misuse. For instance, with reinterpret cast one might, unsafely, cast an integer pointer to a string pointer.

reinterpret_cast<target type>(expression)

The reinterpret_cast<>() is used for all non portable casting operations. This makes it simpler to find these non portable casts when porting an application from one OS to another.

The reinterpret_cast<T>() will change the type of an expression without altering its underlying bit pattern. This is useful to cast pointers of a particular type into a void* and subsequently back to the original type.

Older forms of type casts

Other common type casts exist. They are of the form type(expression) (a functional, or function-style, cast) or (type)expression (often known simply as a C-style cast). The format of (type)expression is more common in C (where it is the only cast notation). It has the basic form:

int i = 10;
long l;
 
l = (long)i; //C style
l = long(i); //C++ style 
             //note: initializes a new long to i, this is not an explicit cast as in the example above
             //however an implicit cast does occur. i = long((long)i);

The more recent keyword casts are more controlled, and should generally be preferred. Some will make the code safer since they will enable to catch more errors at compile-time, and all are easier to search and identify in code. Performance wise they are the same with the exception of dynamic_cast, for which there is no C equivalent.

Common usage of type casting

Performing arithmetical operations with varying types of data type without an explicit cast means that the compiler has to perform an implicit cast to ensure that the values it uses in the calculation are of the same type. Usually, this means that the compiler will convert all of the values to the type of the value with the highest precision.

The following is an integer division and so a value of 2 is returned.

float a = 5 / 2;

To get the intended behavior, you would either need to cast one or both of the constants to a float.

float a = static_cast<float>(5) / static_cast<float>(2);

Or, you would have to define one or both of the constants as a float.

float a = 5f / 2f;

Summary of different casts

reinterpret_cast - mostly non-portable way to convert without changing the representation of a value

int a = 0xffe38024;
int * b = reinterpret_cast<int*>(a);

static_cast - pointer casts from base to derived class, or void* to target type*

BaseClass* a = new DerivedClass();
static_cast<DerivedClass*>(a)->derivedClassMethod();

const_cast - changes a const qualifier

struct A { void func() {} };
 
void f(const A& a) {
  A& b = const_cast<A&>(a);
  b.func();
}

dynamic_cast - similar to static_cast, but has a runtime check which ensures that the object is really of the derived type you're casting to, and is also capable of navigating multiple inheritance hierarchies, including performing so-called "cross casts":

class A { ... };
 
class B : public A { ... };
 
void f(A* a) {
  B* b = dynamic_cast<B*>(a); // Won't compile
  B* b = static_cast<B*>(a);  // Will compile
}
class A { virtual void foo() {} };
 
class B : public A { ... };
 
void f(A* a) {
  B* b = dynamic_cast<B*>(a); // Will compile
  B* b = static_cast<B*>(a);  // Will compile
}

Control Flow Construct Statements

Usually a program is not a linear sequence of instructions. It may repeat code or take decisions for a given path-goal relation. Most programming languages have control flow statements (constructs) which provide some sort of control structures that serve to specify order to what has to be done to perform our program that allow variations in this sequential order:

  • statements may only be obeyed under certain conditions (conditionals),
  • statements may be obeyed repeatedly under certain conditions (loops),
  • a group of remote statements may be obeyed (subroutines).
Logical Expressions as conditions 
Logical expressions can use logical operators in loops and conditional statements as part of the conditions to be met.

Exceptional and unstructured control statements

Some instructions have no particular structure but will have an exceptional usefulness in shaping how other control flow statements are structured, a special care must be taken to prevent unstructured and confusing programming.

break

A break will force the exiting of the present loop iteration into the next statement outside of the loop. It has no usefulness outside of a loop structure except for the switch control statement.

continue

The continue instruction is used inside loops where it will stop the current loop iteration, initiating the the next one.

goto

The goto statement is strongly discouraged as it makes it difficult to follow the program logic, this way inducing to errors. In some (mostly rare) cases, the goto statement allows to write uncluttered code, for example, when handling multiple exit points leading to the cleanup code at a function exit (and neither exception handling or object destructors are better options). Except in those rare cases, the use of unconditional jumps is a frequent symptom of a complicated design, as the presence of many levels of nested statements.

In exceptional cases, like heavy optimization, a programmer may need more control over code behavior; a goto allows the programmer to specify that execution flow jumps directly and unconditionally to a desired label. A label is the name given to a label statement elsewhere in the function.

NOTE:
There is a classic paper in software engineering by W. A. Wulf called "A case against the GOTO", presented in the 25th ACM National Conference in October 1972, a time when the debate about goto statements was reaching its peak. In this paper Wulf defends that goto statements should be regarded as dangerous. Wulf is also known by one of his comments regarding efficiency: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason -- including blind stupidity.".

Syntax

label:
  statement(s);
 
goto label;

A goto can, for example, be used to break out of two nested loops. This example breaks after replacing the first encountered non-zero element with zero.

for (int i = 0; i < 30; ++i) {
  for (int j = 0; j < 30; ++j) {
    if (a[i][j] != 0) {
       a[i][j] = 0;
       goto done;
     }
  }
}
done:
/* rest of program */

Although simple, they quickly lead to illegible and unmaintainable code.

// snarled mess of '''gotos'''
 
int i = 0;
  goto test_it;
body:
  a[i++] = 0;
test_it:
  if (a[i]) 
    goto body;
/* rest of program */

is much less understandable than the equivalent:

for (int i = 0; a[i]; ++i) {
  a[i] = 0;
}
/* rest of program */

Gotos are typically used in functions where performance is critical or in the output of machine-generated code (like a parser generated by yacc.)

The goto statement should almost always be avoided, there are rare cases when it enhances the readability of code. One such case is an "error section".

Example

#include <new>
#include <iostream>
 
int *my_allocated_1;
char *my_allocated_2, *my_allocated_3;
my_allocated_1 = new (std::nothrow) int[500];
 
if (my_allocated_1 == NULL)
{  
  std::cerr << "error in allocated_1" << std::endl;
  goto error;
}
 
my_allocated_2 = new (std::nothrow) char[1000];
 
if (my_allocated_2 == NULL)
{  
  std::cerr << "error in allocated_2" << std::endl;
  goto error;
}
 
my_allocated_3 = new (std::nothrow) char[1000];
 
if (my_allocated_3 == NULL)
{  
  std::cerr << "error in allocated_3" <<std::endl;
  goto error;
}
return 0;
 
error:
  if (my_allocated_1) delete [] my_allocated_1;
  if (my_allocated_2) delete [] my_allocated_2;
  if (my_allocated_3) delete [] my_allocated_3;
  return 1;

This construct avoids hassling with the origin of the error and is cleaner than an equivalent construct with control structures. It is thus less error prone.

NOTE:
While the above example shows a reasonable use of gotos, it is uncommon in practice. Exceptions handle such cases in a clearer, more effective and more organized way. This will be discussed in "Exception Handling" in detail. Using RAII to manage resources such as memory also avoids the need for most of the explicit cleanup code that is shown above.

abort(), exit() and atexit()

As we will see later the Standard C Library that is included in C++ also supplies some useful functions that can alter the flow control. Some will permits you to terminate the execution of a program and enables you to set up a return value or initiate special tasks upon the termination request. You will have to jump ahead into the abort() - exit() - atexit() sections for more information.

Conditionals

There is likely no meaningful program written in which a computer does not demonstrate basic decision-making skills. It can actually be argued that there is no meaningful human activity in which no decision-making, instinctual or otherwise, takes place. For example, when driving a car and approaching a traffic light, one does not think, "I will continue driving through the intersection." Rather, one thinks, "I will stop if the light is red, go if the light is green, and if yellow go only if I am traveling at a certain speed a certain distance from the intersection." These kinds of processes can be simulated using conditionals.

A conditional is a statement that instructs the computer to execute a certain block of code or alter certain data only if a specific condition has been met.

The most common conditional is the if-else statement, with conditional expressions and switch-case statements typically used as more shorthanded methods.

if (Fork branching)

The if-statement allows one possible path choice depending on the specified conditions.

Syntax

if (condition)
{
  statement;
}

Semantic

First, the condition is evaluated:

  • if condition is true, statement is executed before continuing with the body.
  • if condition is false, the program skips statement and continues with the rest of the program.

NOTE:
The condition in an if statement can be any code that resolves in any expression that will evaluate to either a boolean, or a null/non-null value; you can declare variables, nest statements, etc. This is true to other flow control conditionals (ie: while), but is generally regarded as bad style, since it only benefit is ease of typing by making the code less readable.

This characteristic can easily lead simple errors, like tipping a=b (assign a value) in place of a a==b (condition). This has resulted in the adoption of a coding practice that would automatically put the errors in evidence, by inverting the expression (or using constant variables) the compiler will generate an error.

Recent compilers support the detection of such events and generate compilation warnings.

Example

if(condition)
{
  int x; // Valid code
  for(x = 0; x < 10; ++x) // Also valid.
    {
      statement;
    }
}
flowchart from the example

NOTE:
If you wish to avoid typing std::cout, std::cin, or std::endl; all the time, you may include using namespace std at the beginning of your program since cout, cin, and endl are members of the std namespace.

Sometimes the program needs to choose one of two possible paths depending on a condition. For this we can use the if-else statement.

if (user_age < 18)
{
    std::cout << "People under the age of 18 are not allowed." << std::endl;
}
else
{
    std::cout << "Welcome to Caesar's Casino!" << std::endl;
}

Here we display a message if the user is under 18. Otherwise, we let the user in. The if part is executed only if 'user_age' is less than 18. In other cases (when 'user_age' is greater than or equal to 18), the else part is executed.

if conditional statements may be chained together to make for more complex condition branching. In this example we expand the previous example by also checking if the user is above 64 and display another message if so.

if (user_age < 18)
{
  std::cout << "People under the age of 18 are not allowed." << std::endl;
}
else if (user_age > 64)
{
  std::cout << "Welcome to Caesar's Casino! Senior Citizens get 50% off." << std::endl;
}
else
{
  std::cout << "Welcome to Caesar's Casino!" << std::endl;
}
flowchart from the example

NOTE:

  • break and continue don't have any relevance to an if or else.
  • Although you can use multiple else if statements, when handling many related conditions it is recommended that you use the switch statement, which we will be discussing next.

switch (Multiple branching)

The switch statement branches based on specific integer values.

switch (integer expression) {
    case label1:
         statement(s)
         break;
    case label2:
         statement(s)
         break;
    /* ... */
    default:
         statement(s)
}

As you can see in the above scheme the case and default have a "break;" statement at the end of block. This expression will cause the program to exit from the switch, if break is not added the program will continue execute the code in other cases even when the integer expression is not equal to that case. This can be exploited in some cases as seen in the next example.

We want to separate an input from digit to other characters.

 char ch = cin.get()  //get the character
 switch (ch) {
     case '0': 
          // do nothing fall into case 1
     case '1': 
         // do nothing fall into case 2
     case '2': 
        // do nothing fall into case 3
     /* ... */
     case '8': 
        // do nothing fall into case 9
     case '9':  
          std::cout << "Digit" << endl; //print into stream out
          break;
     default:
          std::cout << "Non digit" << endl; //print into stream out
 }

In this small piece of code for each digit below '9' it will propagate through the cases until it will reach case '9' and print "digit".

If not it will go straight to the default case there it will print "Non digit"

NOTE:

  • Be sure to use break commands unless you want multiple conditions to have the same action. Otherwise, it will "fall through" to the next set of commands.
  • break can only break out of the innermost level. If for example you are inside a switch and need to break out of a enclosing for loop you might well consider adding a boolean as a flag, and check the flag after the switch block instead of the alternatives available. (Though even then, refactoring the code into a separate function and returning from that function might be cleaner depending on the situation, and with inline functions and/or smart compilers there need not be any runtime overhead from doing so.)
  • continue is not relevant to switch block. Calling continue within a switch block will lead to the "continue" of the loop which wraps the switch block.

Loops (iterations)

A loop (also referred to as an iteration or repetition) is a sequence of statements which is specified once but which may be carried out several times in succession. The code "inside" the loop (the body of the loop) is obeyed a specified number of times, or once for each of a collection of items, or until some condition is met.

Iteration is the repetition of a process, typically within a computer program. Confusingly, it can be used both as a general term, synonymous with repetition, and to describe a specific form of repetition with a mutable state.

When used in the first sense, recursion is an example of iteration.

However, when used in the second (more restricted) sense, iteration describes the style of programming used in imperative programming languages. This contrasts with recursion, which has a more declarative approach.

Due to the nature of C++ there may lead to an even bigger problems when differentiating the use of the word, so to simplify things use "loops" to refer to simple recursions as described in this section and use iteration or iterator (the "one" that performs an iteration) to class iterator (or in relation to objects/classes) as used in the STL.

Infinite Loops

Sometimes it is desirable for a program to loop forever, or until an exceptional condition such as an error arises. For instance, an event-driven program may be intended to loop forever handling events as they occur, only stopping when the process is killed by the operator.

More often, an infinite loop is due to a programming error in a condition-controlled loop, wherein the loop condition is never changed within the loop.

Condition-controlled loops

Most programming languages have constructions for repeating a loop until some condition changes.

Condition-controlled loops are divided into two categories Preconditional or Entry-Condition that place the test at the start of the loop, and Postconditional or Exit-Condition iteration that have the test at the end of the loop. In the former case the body may be skipped completely, while in the latter case the body is always executed at least once.

In the condition controlled loops, the keywords break and continue take significance. The break keyword causes an exit from the loop, proceeding with the rest of the program. The continue keyword terminates the current iteration of the loop, the loop proceeds to the next iteration.

while (Preconditional loop)

Syntax

while (''condition'') ''statement''; ''statement2'';

Semantic First, the condition is evaluated:

  1. if condition is true, statement is executed and condition is evaluated again.
  2. if condition is false continues with statement2

Remark: statement can be a block of code { ... } with several instructions.

What makes 'while' statements different from the 'if' is the fact that once the body (referred to as statement above) is executed, it will go back to 'while' and check the condition again. If it is true, it is executed again. In fact, it will execute as many times as it has to until the expression is false.

Example 1

#include <iostream>
using namespace std;
 
int main() 
{
  int i=0;
  while (i<10) {
    cout << "The value of i is " << i << endl;
    i++;
  }
  cout << "The final value of i is : " << i< < endl;
  return 0;
}

Execution

 The value of i is 0
 The value of i is 1
 The value of i is 2
 The value of i is 3
 The value of i is 4
 The value of i is 5
 The value of i is 6
 The value of i is 7
 The value of i is 8
 The value of i is 9
 The final value of i is 10

Example 2

// validation of an input
#include <iostream>
using namespace std;
 
int main() 
{
  int a;
  bool ok=false;
  while (!ok) {
    cout << "Type an integer from 0 to 20 : ";
    cin >> a;
    ok = ((a>=0) && (a<=20));
    if (!ok) cout << "ERROR - ";
  }
  return 0;
}

Execution

 Type an integer from 0 to 20 : 30
 ERROR - Type an integer from 0 to 20 : 40
 ERROR - Type an integer from 0 to 20 : -6
 ERROR - Type an integer from 0 to 20 : 14

do-while (Postconditional loop)

Syntax

do {
  statement(s)
} while (condition);
 
statement2;

Semantic

  1. statement(s) are executed.
  2. condition is evaluated.
  3. if condition is true goes to 1).
  4. if condition is false continues with statement2

The do - while loop is similar in syntax and purpose to the while loop. The construct moves the test that continues condition of the loop to the end of the code block so that the code block is executed at least once before any evaluation.

Example

#include <iostream>
 
using namespace std;
 
int main() 
{
  int i=0;
 
  do {
    cout << "The value of i is " << i << endl;
    i++;
  } while (i<10);
 
  cout << "The final value of i is : " << i << endl;
  return 0;
}

Execution

The value of i is 0
The value of i is 1
The value of i is 2
The value of i is 3
The value of i is 4
The value of i is 5
The value of i is 6
The value of i is 7
The value of i is 8
The value of i is 9
The final value of i is 10

for (Preconditional and Counter-controlled loop)

A special case of a preconditional loop that supports constructors for repeating a loop only a certain number of times in the form of a step-expression that can be tested and used to set a step size (the rate of change) by incrementing or decrementing it in each loop.

Syntax

for (initialization ; condition; step-expression)
  statement(s);

The for loop is equivalent to next while loop:

 initialization
 while( condition )
 {
   statement(s);
   step-expression;
 }


NOTE:

Each step of the loop (initialization, condition, and step-expression) can have more than one command, separated by a , (comma operator). initialization,condition, and step expression are all optional arguments. In C++ the comma is very rarely used as an operator. It is mostly used as a separator (ie. int x, y; ).

Example 1

// a unbounded loop structure
for (;;)
{
  statement(s);
  if( statement(s) )
    break;
}

Example 2

// calls doSomethingWith() for 0,1,2,..9
for (int i = 0; i != 10; ++i)
{                  
  doSomethingWith(i); 
}

can be rewritten as:

// calls doSomethingWith() for 0,1,2,..9
int i = 0;
while(i != 10)
{
  doSomethingWith(i);
  ++i;
}

The for loop is a very general construct, which can run unbounded loops (Example 1) and does not need to follow the rigid iteration model enforced by similarly named constructs in a number of more formal languages. C++ (just as modern C) allows variables (Example 2) to be declared in the initialization part of the for loop, and it is often considered good form to use that ability to declare objects only when they can be initialized, and to do so in the smallest scope possible. Essentially, the for and while loops are equivalent. Most for statements can also be rewritten as while statements.

Functions

A function (which can also be referred to as subroutine, procedure, subprogram or even method) carries out tasks defined by a sequence of statements called a statement block that need only be written once and called by a program as many times as needed to carry out the same task. Functions may depend on variables passed to them, called arguments, and may pass results of a task on to the caller of the function, this is called the return value.

In C++, it is important to note that a function that exists in the global scope can also be called global function and a function that is defined inside a class is called a member function. (The term method is commonly used in other programming languages to refer to things like member functions, but this can lead to confusion in dealing with C++ which supports both virtual and non-virtual dispatch of member functions.)

NOTE:
When talking or reading about programming, you must consider the language background and the topic of the source. It's very rare to see a C++ programmer use the words procedure or subprogram, this will vary from language to language. In many programming languages the word function is reserved for subroutines that return a value, this is not the case with C++.

Declarations

A function must be declared before being used, with a name to identify it, what type of value the function returns and the types of any arguments that are to be passed to it. Parameters must be named and declare what type of value it takes. Parameters should always be passed as const if their arguments are not modified. Usually functions performs actions, so the name should make clear what it does. By using verbs in function names and following other naming conventions programs can be read more naturally.

The next example we defines a function named main that returns an integer value int and takes no parameters. The content of the function is called the body of the function. The word int is a keyword. C++ keywords are reserved words, i.e., cannot be used for any purpose other than what they are meant for. On the other hand main is not a keyword and you can use it in many places where a keyword cannot be used (though that is not recommended, as confusion could result).

int main()
{
  // code
  return 0;
}

Parameters and arguments

The function declaration defines its parameters. A parameter is a variable which takes on the meaning of a corresponding argument passed in a call to a function. An argument represents the value you supply to a function parameter when you call it. The calling code supplies the arguments when it calls the function.

The part of the function declaration that declares the expected parameters is called the parameter list and the part of function call that specifies the arguments is called the argument list.

//Global functions declaration
int subtraction_function( int parameter1, int parameter2 ) ( return ( parameter1 - parameter2 ); }
 
//Call to the above function using 2 extra variables so the relation becomes more evident
int argument1 = 4;
int argument2 = 3; 
int result = subtraction_function( argument1, argument2 );
// will have the same result as
int result = subtraction_function( 4, 3 );

Many programmers use parameter and argument interchangeably, depending on context to distinguish the meaning. In practice, distinguishing between the two terms is usually unnecessary in order to use them correctly or communicate their use to other programmers. Alternatively, the equivalent terms formal parameter and actual parameter may be used instead of parameter and argument.

Parameters

You can define a function with no parameters, one parameter, or more than one, but to use a call to that function with arguments you must take into consideration what is defined.

Empty parameter list

//Global functions with no parameters
void function() { /*...*/ }
//empty parameter declaration equivalent the use of void
void function( void ) ( /*...*/ }

NOTE:
This is the only valid case were void can be used as a parameter type, you can only derived types from void (ie: void* ).

Multiple parameters

The syntax for declaring and invoking functions with multiple parameters is a common source of errors. First, remember that you have to declare the type of every parameter.

// Example - function using two int parameters by value
void printTime (int hour, int minute) { 
  std::cout << hour; 
  std::cout << ":"; 
  std::cout << minute; 
}

It might be tempting to write (int hour, minute), but that format is only legal for variable declarations, not for parameter declarations.

Another common source of confusion is that you do not have to declare the types of arguments when you call a function (indeed it is an error to attempt to do so).

Example

int hour = 11; 
int minute = 59; 
printTime( int hour, int minute ); // WRONG!

In this case, the compiler can tell the type of hour and minute by looking at their declarations. It is unnecessary and illegal to include the type when you pass them as arguments. The correct syntax is printTime( hour, minute ).

Passing by Pointer

A function may use pass by pointer when the object pointed to might not exist, that is, when you are giving either the address of a real object or NULL. Passing a pointer is not different to passing anything else. Its a parameter the same as any other. The characteristics of the pointer type is what makes it a worth distinguishing.

The passing a pointer to a function is very similar to passing it as a reference. It is used to avoid the overhead of copying, and the slicing problem (since child classes have a bigger memory footprint that the parent) that can occur when passing base class objects by value. This is also the preferred method in C (for historical reasons), were passing by pointer signifies that wanted to modify the original variable. In C++ it is preferred to use references to pointers and guarantee that the function before dereferencing it, verifies the pointer for validity.

TODO

TODO
Reorder, simplify and clarify

#include <iostream>
 
void MyFunc( int *x ) 
{ 
  std::cout << *x << std::endl; // See next section for explanation
} 
 
int main() 
{ 
  int i; 
  MyFunc( &i ); 
 
  return 0; 
}

Since a reference is just an alias, it has exactly the same address as what it refers to, as in the following example:

#include <iostream>
 
void ComparePointers (int * a, int * b)
{
  if (a == b)
    std::cout<<"Pointers are the same!"<<std::endl;
  else
    std::cout<<"Pointers are different!"<<std::endl;
}
 
int main()
{
  int i, j;
  int& r = i;
 
  ComparePointers(&i, &i);
  ComparePointers(&i, &j);
  ComparePointers(&i, &r);
  ComparePointers(&j, &r);
 
  return 0;
}

This schizophrenic program will tell you that the pointers are the same, then that they are different, then the same, then different again.

Arrays are similar to pointers, remember?

Now might be a good time to reread the section on arrays. If you don't feel like flipping back that far, though, here's a brief recap: Arrays are blocks of memory space.

int my_array[5];

In the statement above, my_array is an area in memory big enough to hold five integers. To use an element of the array, it must be dereferenced. The third element in the array (remember they're zero-indexed) is my_array[2]. When you write my_array[2], you're actually saying "give me the third integer in the array my_array". Therefore, my_array is an array, but my_array[2] is an integer.

Passing a single array element

So let's say you want to pass one of the integers in your array into a function. How do you do it? Simply pass in the dereferenced element, and you'll be fine.

Example

#include <iostream>
 
void printInt(int printable){
  std::cout << "The int you passed in has value " << printable << std::endl;
}
int main(){
  int my_array[5];
 
  // Reminder: always initialize your array values!
  for(int i = 0; i < 5; i++)
    my_array[i] = i * 2;
 
  for(int i = 0; i < 5; i++)
    printInt(my_array[i]); // <-- We pass in a dereferenced array element
}

This program outputs the following:

The int you passed in has value 0
The int you passed in has value 2
The int you passed in has value 4
The int you passed in has value 6
The int you passed in has value 8

This passes array elements just like normal integers, because array elements like my_array[2] are integers.

Passing a whole array

Well, we can pass single array elements into a function. But what if we want to pass a whole array? We can't do that directly, but you can treat the array as a pointer.

Example

#include <iostream>
 
void printIntArr(int *array_arg, int array_len){
  std::cout << "The length of the array is " << array_len << std::endl;
  for(int i = 0; i < array_len; i++)
    std::cout << "Array[" << i << "] = " << array_arg[i] << std::endl;
}
 
int main(){
  int my_array[5];
 
  // Reminder: always initialize your array values!
  for(int i = 0; i < 5; i++)
    my_array[i] = i * 2;
 
  printIntArr(my_array, 5);
}

NOTE:
Due to array-pointer interchangeability in the context of parameter declarations only, we can also declare pointers as arrays in function parameter lists. It is treated identically. For example, the first line of the function above can also be written as

void printIntArr(int array_arg[], int array_len)

It is important to note that even if it is written as int array_arg[], the parameter is still a pointer of type int *. It is not an array; an array passed to the function will still be automatically converted to a pointer to its first element.

This will output the following:

The length of the array is 5
Array[0] = 0
Array[1] = 2
Array[2] = 4
Array[3] = 6
Array[4] = 8

As you can see, the array in main is accessed by a pointer. Now here's some important points to realize:

  • Once you pass an array to a function, it is converted to a pointer so that function has no idea how to guess the length of the array. Unless you always use arrays that are the same size, you should always pass in the array length along with the array.
  • You've passed in a POINTER. my_array is an array, not a pointer. If you change array_arg within the function, my_array doesn't change (i.e., if you set array_arg to point to a new array). But if you change any element of array_arg, you're changing the memory space pointed to by array_arg, which is the array my_array.
TODO

TODO
Passing a single element (by value vs. by reference), passing the whole array (always by reference), passing as const

Passing by Reference

The same concept of references is used when passing variables.

Example

void foo( int &i )
{
  ++i;
}
 
int main()
{
  int bar = 5;   // bar == 5
  foo( bar );    // bar == 6
  foo( bar );    // bar == 7
 
  return 0;
}

Here we display one of the two common uses of references in function arguments -- they allow us to use the conventional syntax of passing an argument by value but manipulate the value in the caller.

NOTE:
If the parameter is a non-const reference, the caller expects it to be modified. If the function does not want to modify the parameter, a const reference should be used instead.

However there is a more common use of references in function arguments -- they can also be used to pass a handle to a large data structure without making multiple copies of it in the process. Consider the following:

void foo( const std::string & s ) // const reference, explained below
{
  std::cout << s << std::endl;
}
 
void bar( std::string s )
{
  std::cout << s << std::endl;
}
 
int main()
{
  std::string const text = "This is a test.";
 
  foo( text ); // doesn't make a copy of "text"
  bar( text ); // makes a copy of "text"
 
  return 0;
}

In this simple example we're able to see the differences in pass by value and pass by reference. In this case pass by value just expends a few additional bytes, but imagine for instance if text contained the text of an entire book.

The reason why we use a constant reference instead of a reference is the user of this function can assure that the value of the variable passed does not change within the function. We technically call this "const-to-reference".

The ability to pass it by reference keeps us from needing to make a copy of the string and avoids the ugliness of using a pointer.

NOTE:
It should also be noted that "const-to-reference" only makes sense for complex types -- classes and structs. In the case of ordinal types -- i.e. int, float, bool, etc. -- there is no savings in using a reference instead of simply using pass by value, and indeed the extra costs associated wtih indirection may make code using a reference slower than code that copies small objects.

Passing an array of fixed-length by using reference

In some case, a function requires an array of a specific length to work:

void func(int(&para)[4]);

Unlike the case of array changed into pointer above, the parameter is not a PLAIN array that can be changed into a pointer, but rather a reference to array with 4 int's. Therefore, only array of 4 int's, not array of any other length, not pointer to int, can be passed into this function. This helps you prevent buffer overflow errors because the array object is ALWAYS allocated unless you circumvent the type system by casting.

It can be used to pass an array without specifying the number of elements manually:

template<int n>void func(int(&para)[n]);

The compiler generates the value of length at compile time, inside the function, n stores the number of elements. However, the use of template generates code bloat.

In C++, a multi-dimensional array cannot be converted to a multi-level pointer, therefore, the code below is invalid:

// WRONG
void foo(int**matrix,int n,int m);
int main(){
	int data[10][5];
	// do something on data
	foo(data,10,5);
}

Although an int[10][5] can be converted to an (*int)[5], it cannot be converted to int**. Therefore you may need to hard-code the array bound in the function declaration:

// BAD
void foo(int(*matrix)[5],int n,int m);
int main(){
	int data[10][5];
	// do something on data
	foo(data,10,5);
}

To make the function more generic, templates and function overloading should be used:

// GOOD
template<int junk,int rubbish>void foo(int(&matrix)[junk][rubbish],int n,int m);
void foo(int**matrix,int n,int m);
int main(){
	int data[10][5];
	// do something on data
	foo(data,10,5);
}

The reason for having n and m in the first version is mainly for consistency, and also deal with the case that the array allocated is not used completely. It may also be used for checking buffer overflows by comparing n/m with junk/rubbish.

Passing by Value

When we want to write a function which the value of the argument is independent to the passed variable, we use pass-by-value approach.

int add(int num1, int num2)
{
 num1 += num2; // change of value of "num1"
 return num1;
}
 
int main()
{
 int a = 10, b = 20, ans;
 ans = add(a, b);
 std::cout << a << " + " << b << " = " << ans << std::endl;
 return 0;
}

Output:

10 + 20 = 30

The above example shows a property of pass-by-value, the arguments are copies of the passed variable and only in the scope of the corresponding function. This means that we have to afford the cost of copying. However, this cost is usually considered only for larger and more complex variables.
In this case, the values of "a" and "b" are copied to "num1" and "num2" on the function "add()". We can see that the value of "num1" is changed in line 3. However, we can also observe that the value of "a" is kept after passed to this function.

Constant Parameters

The keyword const can also be used as a guarantee that a function will not modify a value that is passed in. This is really only useful for references and pointers (and not things passed by value), though there's nothing syntactically to prevent the use of const for arguments passed by value.

Take for example the following functions:

void foo( const std::string &s )
{
   s.append("blah"); // ERROR -- we can't modify the string

   std::cout << s.length() << std::endl; // fine
}

void bar( const Widget *w )
{
    w->rotate(); // ERROR - rotate wouldn't be const

    std::cout << w->name() << std::endl; // fine
}
       

In the first example we tried to call a non-const method -- append() -- on an argument passed as a const reference, thus breaking our agreement with the caller not to modify it and the compiler will give us an error.

The same is true with rotate(), but with a const pointer in the second example.

Default values

Parameters in C++ functions (including member functions and constructors) can be declared with default values, like this

int foo (int a, int b = 5, int c = 3);

Then if the function is called with fewer arguments (but enough to specify the arguments without default values), the compiler will assume the default values for the missing arguments at the end. For example, if I call

foo(6, 1)

that will be equivalent to calling

foo(6, 1, 3)

In many situations, this saves you from having to define two separate functions that take different numbers of parameters, which are almost identical except for a default value.

The "value" that is given as the default value is often a constant, but may be any valid expression, including a function call that performs arbitrary computation.

Default values can only be given for the last arguments; i.e. you cannot give a default value for a parameter that is followed by a parameter that doesn't have a default value, since it will never be used.

Once you define the default value for a parameter in a function declaration, you cannot re-define a default value for the same parameter in a later declaration, even if it is the same value.

Ellipsis (...) as a parameter

If the parameter list ends with an ellipsis, it means that the arguments number must be equal or greater than the number of parameters specified. It will in fact create a variadic function, a function of variable arity; that is, one which can take different numbers of arguments.

TODO

TODO
Mention printf, <cstdarg> and check declaration


NOTE:

The variadic function feature is going to be readdressed in the upcoming C++ language standard, C++0x; with the possible inclusion of variatic macros and the ability to create variadic template classes and variadic template functions. Variadic templates will finally allow the creation of true tuple classes in C++.

Returning Values

When declaring a function, you must declare it in terms of the type that it will return, for example:

 int MyFunc(); // returns an int 
 SOMETYPE MyFunc(); // returns a SOMETYPE 
 
 int* MyFunc(); // returns a pointer to an int 
 SOMETYPE *MyFunc(); // returns a pointer to a SOMETYPE 
 SOMETYPE &MyFunc(); // returns a reference to a SOMETYPE 

If you have understood the syntax of pointer declarations, the declaration of a function that returns a pointer or a reference should seem logical. The above piece of code shows how to declare a function that will return a reference or a pointer; below are outlines of what the definitions (implementations) of such functions would look like:

SOMETYPE *MyFunc(int *p) 
{ 
  ... 
  ... 
  return p; 
} 
SOMETYPE &MyFunc(int &r) 
{ 
  ... 
  ... 
  return r; 
} 

Within the body of the function, the return statement should NOT return a pointer or a reference that has the address in memory of a local variable that was declared within the function, because as soon as the function exits, all local variables are destroyed and your pointer or reference will be pointing to some place in memory which you no longer own, so you cannot guarantee its contents. If the object to which a pointer refers is destroyed, the pointer is said to be a dangling pointer until it is given a new value; any use of the value of such a pointer is invalid. Having a dangling pointer like that is dangerous; pointers or references to local variables must not be allowed to escape the function in which those local (aka automatic) variables live.

However, within the body of your function, if your pointer or reference has the address in memory of a data type, struct, or class that you dynamically allocated the memory for, using the new operator, then returning said pointer or reference would be reasonable:

SOMETYPE *MyFunc()  //returning a pointer that has a dynamically 
{           //allocated memory address is valid code 
  int *p = new int[5]; 
  ... 
  ... 
  return p; 
}

(In most cases, a better approach in that case would be to return an object such as a smart pointer which could manage the memory; explicit memory management using widely distributed calls to new and delete (or malloc and free) is tedious, verbose and error prone. At the very least, functions which return dynamically allocated resources should be carefully documented. See this book's section on memory management for more details.)

const SOMETYPE *MyFunc(int *p) 
{
  ... 
  ... 
  return p; 
} 

in this case the SOMETYPE object pointed to by the returned pointer may not be modified, and if SOMETYPE is a class then only const member functions may be called on the SOMETYPE object.

If such a const return value is a pointer or a reference to a class then we cannot call non-const methods on that pointer or reference since that would break our agreement not to change it.

NOTE:
As a general rule methods should be const except when it's not possible to make them such. While getting used to the semantics you can use the compiler to inform you when a method may not be const -- it will (usually) give an error if you declare a method const that needs to be non-const.

Functions with results 

You might have noticed by now that some of the functions yield results. Other functions perform an action but don't return a value. That raises some questions:

  • What happens if you call a function and you don't do anything with the result (i.e. you don't assign it to a variable or use it as part of a larger expression)?
  • What happens if you use a function without a result as part of an expression, like newLine() + 7?
  • Can we write functions that yield results, or are we stuck with things like newLine and printTwice?

The answer to the third question is "yes, you can write functions that return values," and we'll do it in a couple of chapters. I will leave it up to you to answer the other two questions by trying them out. Any time you have a question about what is legal or illegal in C++, a first step to find out is to ask the compiler. However you can not rely on the compiler for two reasons: First a compiler has bugs just like any other software, so it happens that not every source code which is forbidden in C++ is properly rejected by the compiler, and vice versa. The other reason is even more dangerous: You can write programs in C++ which a C++ implementation is not required to reject, but whose behavior is not defined by the language. Needless to say, running such a program can, and occasionally will, do harmful things to the system it is running or produce corrupt output!

Static Return

When a function returns a variable (or a pointer to one) that is statically located, one must keep in mind that it will be possible to overwrite it's content each time a function that uses it is called. If you want to save the return value of this function, you should manually save it elsewhere. Most of this static return use global variables (see scope).

Of course, when you save it elsewhere, you should make sure to actually copy the value(s) of this variable to another location. If the return value is a struct, you should make a new struct, then copy over the members of the struct.

One example of such a function is the Standard C Library function localtime.

Return Codes (best practices)

There are 2 kinds of behaviors :

NOTE:
The selection of, and consistent use of this practice helps to avoid simple errors. Personal taste or organizational dictates may influence the decision, but a general rule-of-thumb is that you should follow whatever choice has been made in the code base you are currently working in. However, there may be valid reasons for making a different choice in any particular situation.

Positive Means Success

This is the "logical" way to think, and as such the one used by almost all beginners. In C++, this takes the form of a boolean true/false test, where "true" (also 1 or any non-zero number) means success, and "false" (also 0) means failure.

The major problem of this construct is that all errors return the same value (false), so you must have some kind of externally visible error code in order to determine where the error occurred. For example:

 bool bOK;
 if (my_function1())
 {
     // block of instruction 1
     if (my_function2())
     {
         // block of instruction 2
         if (my_function3())
         {
              // block of instruction 3
              // Everything worked
              error_code = NO_ERROR;
              bOK = true;
         }
         else
         {
              //error handler for function 3 errors
              error_code = FUNCTION_3_FAILED;
              bOK = false;
         }
     }
     else
     {
         //error handler for function 2 errors
         error_code = FUNCTION_2_FAILED;
         bOK = false;
     }
 }
 else
 {
     //error handler for function 1 errors
     error_code = FUNCTION_1_FAILED;
     bOK = false;
 }
 return bOK;
 

As you can see, the else blocks (usually error handling) of my_function1 can be really far from the test itself; this is the first problem. When your function begins to grow, it's often difficult to see the test and the error handling at the same time.

This problem can be compensated by source code editor features such as folding, or by testing for a function returning "false" instead of true.

 if (!my_function1()) // or if (my_function1() == false) 
 {
     //error handler for function 1 errors
 ...   

This can also make the code look more like the "0 means success" paradigm, but a little less readable.

The second problem of this construct is that it tends to break up logical tests (my_function2 is one level more indented, my_function3 is 2 levels indented) which causes legibility problems.

One advantage here is that you follow the structured programming principle of a function having a single entry and a single exit.

The Microsoft Foundation Class Library (MFC) is an example of a standard library that uses this paradigm.

0 means success

This means that if a function returns 0, the function has completed successfully. Any other value means that an error occurred, and the value returned may be an indication of what error occurred.

The advantage of this paradigm is that the error handling is closer to the test itself. For example the previous code becomes:

 if (my_function1())
 {
     //error handler for function 1 errors
     return FUNCTION_1_FAILED;
 }
 // block of instruction 1
 if (my_function2())
 {
     //error handler for function 2 errors
     return FUNCTION_2_FAILED;
 }
 // block of instruction 2
 if (my_function3())
 {
     //error handler for function 3 errors
     return FUNCTION_3_FAILED;
 }
 // block of instruction 3
 // Everything worked
 return 0; // NO_ERROR

In this example, this code is more readable (this will not always be the case). However, this function now has multiple exit points, violating a principle of structured programming.

The C Standard Library (libc) is an example of a standard library that uses this paradigm.

NOTE:
Some people argue that using functions results in a performance penalty. In this case just use inline functions and let the compiler do the work. Small functions mean visibility, easy debugging and easy maintenance.

Composition

Just as with mathematical functions, C++ functions can be composed, meaning that you use one expression as part of another. For example, you can use any expression as an argument to a function:

double x = cos (angle + pi/2); 

This statement takes the value of pi, divides it by two and adds the result to the value of angle. The sum is then passed as an argument to the cos function.

You can also take the result of one function and pass it as an argument to another:

double x = exp (log (10.0)); 

This statement finds the log base e of 10 and then raises e to that power. The result gets assigned to x; I hope you know what it is.

Recursion

In programming languages, recursion was first implemented in Lisp on the basis of a mathematical concept that existed earlier on, it is a concept that allows us to break down a problem into one or more subproblems that are similar in form to the original problem, in this case, of having a function call itself in some circumstances. It is generally distinguished from iterators or loops.

A simple example of a recursive function is:

 void func(){
    func();
 }

It should be noted that non-terminating recursive functions as shown above are almost never used in programs (indeed, some definitions of recursion would exclude such non-terminating definitions). A terminating condition is used to prevent infinite recursion.

Example
 double power(double x, int n)
 {
  if(n < 0)
  {
     std::cout << std::endl
               << "Negative index, program terminated.";
     exit(1);
  }
  if(n)
     return x * power(x, n-1);
  else
     return 1.0;
 }

The above function can be called like this:

 x = power(x, static_cast<int>(power(2.0, 2)));

Why is recursion useful? Although, theoretically, anything possible by recursion is also possible by iteration (that is, while), it is sometimes much more convenient to use recursion. Recursive code happens to be much easier to follow as in the example below. The problem with recursive code is that it takes too much memory. Since the function is called many times, without the data from the calling function removed, memory requirements increase significantly. But often the simplicity and elegance of recursive code overrules the memory requirements.

The classic example of recursion is the factorial: n! = (n − 1)!n, where 0! = 1 by convention. In recursion, this function can be succinctly defined as

unsigned factorial(unsigned n)
{
  if(n != 0) 
  {
    return n * factorial(n-1);
  } 
  else 
  {
    return 1;
  }
}

With iteration, the logic is harder to see:

unsigned factorial2(unsigned n)
{
  int a = 1;
  while(n > 0)
  {
    a = a*n;
    n = n-1;
  }
  return a;
}

Although recursion tends to be slightly slower than iteration, it should be used where using iteration would yield long, difficult-to-understand code. Also, keep in mind that recursive functions take up additional memory (on the stack) for each level. Thus they can run out of memory where an iterative approach may just use constant memory.

Each recursive function needs to have a Base Case. A base case is where the recursive function stops calling itself and returns a value. The value returned is (hopefully) the desired value.

For the previous example,

unsigned factorial(unsigned n)
{
  if(n != 0) 
  {
    return n * factorial(n-1);
  } 
  else 
  {
    return 1;
  }
}

the base case is reached when n = 0. In this example, the base case is everything contained in the else statement (which happens to return the number 1). The overall value that is returned is every value from n to 0 multiplied together. So, suppose we call the function and pass it the value 3. The function then does the math 3 * 2 * 1 = 6 and returns 6 as the result of calling factorial(3).

Another classic example of recursion is the sequence of Fibonacci numbers:

0 1 1 2 3 5 8 13 21 34 ...

The zeroth element of the sequence is 0. The next element is 1. Any other number of this series is the sum of the two elements coming before it. As an exercise, write a function that returns the nth Fibonacci number using recursion.

Inline

Normally when calling a function, a program will evaluate and store the arguments, and then call (or branch to) the function's code, and then the function will later return back to the caller. While function calls are fast (typically taking much less than a microsecond on modern processors), the overhead can sometimes be significant, particularly if the function is simple and is called many times.

One approach which can be a performance optimization in some situations is to use so-called "inline" functions. Marking a function as inline is a request (sometimes called a hint) to the compiler to consider replacing a call to the function by a copy of the code of that function.

The result is in some ways similar to the use of the #define macro, but as mentioned before, macros can lead to problems since they aren't evaluated by the preprocessor. Inline functions do not suffer from the same problems.

If the inlined function is large, this replacement process (known for obvious reasons as "inlining") can lead to "code bloat", leading to bigger (and hence usually slower) code. However, for small functions it can even reduce code size, particularly once a compiler's optimizer runs.

Note that the inlining process requires that the function's definition (including the code) must be available to the compiler. In particular, inline headers that are used from more than one source file must be completely defined within a header file (whereas with regular functions that would be an error).

The most common way to designate that a function is inline is by the use of the inline keyword. One must keep in mind that compilers can be configured to ignore this keyword and use their own optimizations.

Further considerations are given when dealing with inline member function, this will be covered on the Object-Oriented Programming Chapter .

TODO

TODO
Complete and give examples

main

The function main also happens to be the entry point of any (standard-compliant) C++ program and must be defined. The compiler arranges for the main function to be called when the program begins execution. main may call other functions which may call yet other functions.

NOTE:
main is special in C++ in that user code is not allowed to call it; in particular, it cannot be directly or indirectly recursive. This is one of the many small ways in which C++ differs from C.

The main function returns an integer value. In certain systems, this value is interpreted as a success/failure code. The return value of zero signifies a successful completion of the program. Any non-zero value is considered a failure. Unlike other functions, if control reaches the end of main(), an implicit return 0; for success is automatically added. To make return values from main more readable, the header file cstdlib defines the constants EXIT_SUCCESS and EXIT_FAILURE (to indicate successful/unsuccessful completion respectively).

NOTE:
The ISO C++ Standard (ISO/IEC 14882:1998) specifically requires main to have a return type of int. But the ISO C Standard (ISO/IEC 9899:1999) actually does not, though most compilers treat this as a minor warning-level error.

The main function can also be declared like this:

int main(int argc, char **argv){
  // code
}

which defines the main function as returning an integer value int and taking two parameters. The first parameter of the main function, argc, is an integer value int that specifies the number of arguments passed to the program, while the second, argv, is an array of strings containing the actual arguments. There is almost always at least one argument passed to a program; the name of the program itself is the first argument, argv[0]. Other arguments may be passed from the system.

Example

#include <iostream>
 
int main(int argc, char **argv){
  std::cout << "Number of arguments: " << argc << std::endl;
  for(size_t i = 0; i < argc; i++)
    std::cout << "  Argument " << i << " = '" << argv[i] << "'" << std::endl;
}

NOTE:
size_t is the return type of sizeof function. size_t is a typedef for some unsigned type and is often defined as unsigned int or unsigned long but not always.

If the program above is compiled into the executable arguments and executed from the command line like this in *nix:

$ ./arguments I love chocolate cake

Or in Command Prompt in Windows or MS-DOS:

C:\>arguments I love chocolate cake

It will output the following (but note that argument 0 may not be quite the same as this -- it might include a full path, or it might include the program name only, or it might include a relative path, or it might even be empty):

Number of arguments: 5
  Argument 0 = './arguments'
  Argument 1 = 'I'
  Argument 2 = 'love'
  Argument 3 = 'chocolate'
  Argument 4 = 'cake'

You can see that the command line arguments of the program are stored into the argv array, and that argc contains the length of that array. This allows you to change the behavior of a program based on the command line arguments passed to it.

NOTE:
argv is a (pointer to the first element of an) array of strings. As such, it can be written as char **argv or as char *argv[]. However, char argv[][] is not allowed. Read up on C++ arrays for the exact reasons for this.

Also, argc and argv are the two most common names for the two arguments given to the main function. You can think them to stand for "arguments count" and "arguments variables" respectively. They can, however, be changed if you'd like. The following code is just as legal:

int main(int foo, char **bar){
  // code
}

However, any other programmer that sees your code might get mad at you if you code like that.

From the example above, we can also see that C++ do not really care about what the variables' names are (of course, you cannot use reserved words as names) but their types.

Pointers to functions

The pointers we have looked at so far have all been data pointers, pointers to functions (more often called function pointers) are very similar and share the same characteristics of other pointers but in place of pointing to a variable they point to functions. Creating an extra level of indirection, as a way to use the functional programming paradigm in C++, since it facilitates calling functions which are determined at runtime from the same piece of code. They allow passing a function around as parameter or return value in another function.

Using function pointers has exactly the same overhead as any other function call plus the additional pointer indirection and since the function to call is determined only at runtime, the compiler will typically not inline the function call as it could do anywhere else. Because of this characteristics, using function pointers may add up to be significantly slower than using regular function calls, and be avoided as a way to gain performance.

NOTE:
Function pointers are mostly used in C, C++ also permits another constructs to enable functional programming that are called functors (class type functors and template type functors) that have some advantages over function pointers.

To declare a pointer to a function naively, the name of the pointer must be parenthesized, otherwise a function returning a pointer will be declared. You also have to declare the function's return type and its parameters. These must be exact!

Consider:

int (*ptof)(int arg);

The function to be referenced must obviously have the same return type and the same parameter type as that of the pointer to function. The address of the function can be assigned just by using its name, optionally prefixed with the address-of operator &. Calling the function can be done by using either ptof(<value>) or (*ptof)(<value>).

So:

int (*ptof)(int arg);
int func(int arg){
    //function body
}
ptof = &func; // get a pointer to func
ptof = func;  // same effect as ptof = &func
(*ptof)(5);   // calls func
ptof(5);      // same thing.

A function returning a float can't be pointed to by a pointer returning a double. If two names are identical (such as int and signed, or a typedef name), then the conversion is allowed. Otherwise, they must be entirely the same. You define the pointer by grouping the * with the variable name as you would any other pointer. The problem is that it might get interpreted as a return type instead.

It is often clearer to use a typedef for function pointer types; this also provides a place to give a meaningful name to the function pointer's type:

typedef int (*int_to_int_function)(int);
int_to_int_function ptof;
int *func (int);   // WRONG: Declares a function taking an int returning pointer-to-int.
int (*func) (int); // RIGHT: Defines a pointer to a function taking an int returning int.

To help reduce confusion, it is popular to typedef either the function type or the pointer type:

typedef int ifunc (int);    // now "ifunc" means "function taking an int returning int"
typedef int (*pfunc) (int); // now "pfunc" means "pointer to function taking an int returning int"

If you typedef the function type, you can declare, but not define, functions with that type. If you typdef the pointer type, you cannot either declare or define functions with that type. Which to use is a matter of style (although the pointer is more popular).

To assign a pointer to a function, you simply assign it to the function name. The & operator is optional (it's not ambiguous). The compiler will automatically select an overloaded version of the function appropriate to the pointer, if one exists:

int f (int, int);
int f (int, double);
int g (int, int = 4);
double h (int);
int i (int);
 
int (*p) (int) = &g; // ERROR: The default parameter needs to be included in the pointer type.
p = &h;              // ERROR: The return type needs to match exactly.
p = &i;              // Correct.
p = i;               // Also correct.
 
int (*p2) (int, double);
p2 = f;              // Correct: The compiler automatically picks "int f (int, double)".

Using a pointer to a function is even simpler - you simply call it like you would a function. You are allowed to dereference it using the * operator, but you don't have to:

#include <iostream>
 
int f (int i) { return 2 * i; }
 
int main ()
 {
  int (*g) (int) = f;
  std::cout<<"g(4) is "<<g(4)<<std::endl;    // Will output "g(4) is 8"
  std::cout<<"(*g)(5) is "<<g(5)<<std::endl; // Will output "g(5) is 10"
  return 0;
 }

Callback

In computer programming, a callback is executable code that is passed as an argument to other code. It allows a lower-level abstaraction layer to call a function defined in a higher-level layer. A callback is often back on the level of the original caller.

A callback is often back on the level of the original caller.

Usually, the higher-level code starts by calling a function within the lower-level code, passing to it a pointer or handle to another function. While the lower-level function executes, it may call the passed-in function any number of times to perform some subtask. In another scenario, the lower-level function registers the passed-in function as a handler that is to be called asynchronously by the lower-level at a later time in reaction to something.

A callback can be used as a simpler alternative to polymorphism and generic programming, in that the exact behavior of a function can be dynamically determined by passing different (yet compatible) function pointers or handles to the lower-level function. This can be a very powerful technique for code reuse. In another common scenario, the callback is first registered and later called asynchronously.

In another common scenario, the callback is first registered and later called asynchronously.
TODO

TODO
Add missing, redirect links info and add examples...

Overloading

Function overloading is the use of a single name for several different functions in the same scope. Multiple functions who share the same name must be differentiated by using another set of parameters for every such function. The functions can be different in the number of parameters they expect, or their parameters can differ in type. This way, the compiler can figure out the exact function to call by looking at the arguments the caller supplied. This is called overload resolution, and is quite complex.

// Overloading Example
 
// (1)
double geometric_mean( int, int );
 
// (2)
double geometric_mean( double, double );
 
// (3)
double geometric_mean( double, double, double );
 
// ...
 
// Will call (1):
geometric_mean( 10, 25 );
// Will call (2):
geometric_mean( 22.1, 421.77 );
// Will call (3):
geometric_mean( 11.1, 0.4, 2.224 );

Under some circumstances, a call can be ambiguous, because two or more functions match with the supplied arguments equally well.

Example, supposing the declaration of geometric_mean above:

// This is an error, because (1) could be called and the second
// argument casted to an int, and (2) could be called with the first
// argument casted to a double. None of the two functions is
// unambiguously a better match.
geometric_mean(7, 13.21);
// This will call (3) too, despite its last argument being an int,
// Because (3) is the only function which can be called with 3
// arguments
geometric_mean(1.1, 2.2, 3);

Templates and non-templates can be overloaded. A non-template function takes precedence over a template, if both forms of the function match the supplied arguments equally well.

Note that you can overload many operators in C++ too.

Overloading resolution

Please beware that overload resolution in C++ is one of the most complicated parts of the language. This is probably unavoidable in any case with automatic template instantiation, user defined implicit conversions, built-in implicit conversation and more as language features. So don't despair if you do not understand this at first go. It's really quite natural, once you have the ideas, but written down it seems extremely complicated.

TODO

TODO
*This section does not cover the selection of constructors because, well, that's even worse. Namespaces are also not considered below.

  • Feel free to add the missing information, possibly as another chapter.

The easiest way to understand overloading is to imagine that the compiler first finds every function which might possibly be called, using any legal conversions and template instantiations. The compiler then selects the best match, if any, from this set. Specifically, the set is constructed like this:

  • All functions with matching name, including function templates, are put into the set. Return types and visibility are not considered. Templates are added with as closely matching parameters as possible. Member functions are considered functions with the first parameter being a pointer-to-class-type.
  • Conversion functions are added as so-called surrogate functions, with two parameters, the first being the class type and the second the return type.
  • All functions that don't match the number of parameters, even after considering defaulted parameters and ellipses, are removed from the set.
  • For each function, each argument is considered to see if a legal conversion sequence exists to convert the caller's argument to the function's parameters. If no such conversion sequence can be found, the function is removed from the set.

The legal conversions are detailed below, but in short a legal conversion is any number of built-in (like int to float) conversions combined with at most one user defined conversion. The last part is critical to understand if you are writing replacements to built-in types, such as smart pointers. User defined conversions are described above, but to summarize it is

  1. implicit conversion operators like operator short toShort();
  2. One argument constructors (If a constructor has all but one parameter defaulted, it is considered one-argument)

The overloading resolution works by attempting to establish the best matching function.

Easy conversions are preferred

Looking at one parameter, the preferred conversion is roughly based on scope of the conversion. Specifically, the conversions are preferred in this order, with most-preferred highest:

  1. No conversion, adding one or more const, adding reference, convert array to pointer to first member
    1. const are preferred for rvalues (roughly constants) while non-const are preferred for lvalues (roughly assignables)
  2. Conversion from short integral types (bool, char, short) to int, and float to double.
  3. Built-in conversions, such as between int and double and pointer type conversion. Pointer conversion are ranked as
    1. Base to derived (pointers) or derived to base (for pointers-to-members), with most-derived preferred
    2. Conversion to void*
    3. Conversion to bool
  4. User-defined conversions, see above.
  5. Match with ellipses. (As an aside, this is rather useful knowledge for template meta programming)

The best match is now determined according to the following rules:

  • A function is only a better match if all parameters match at least as well

In short, the function must be better in every respect --- if one parameter matches better and another worse, neither function is considered a better match. If no function in the set is a better match than both, the call is ambiguous (i.e, it fails) Example:

void foo(void*, bool);
void foo(int*, int);

int main() {
   int a;
   foo(&a, true); // ambiguous 
}
  • Non-templates are preferred over templates

If all else is equal between two functions, but one is a template and the other not, the non-template is preferred. This seldom causes surprises.

  • Most-specialized template is preferred

When all else is equal between two template function, but one is more specialized than the other, the most specialized version is preferred. Example:

template<typename T> void foo(T);  //1
template<typename T> void foo(T*); //2

int main() {
   int a;
   foo(&a); // Calls 2, since 2 is more specialized.
}

Which template is more specialized is an entire chapter unto itself.

  • Return types are ignored

This rule is mentioned above, but it bears repeating: Return types are never part of overload resolutions, even if the function selected has a return type that will cause the compilation to fail. Example:

void foo(int);
int foo(float);

int main() { 
   // This will fail since foo(int) is best match, and void cannot be converted to int.
   return foo(5); 
}
  • The selected function may not be accessible

If the selected best function is not accessible (e.g, it is a private function and the call it not from a member or friend of its class), the call fails.

Standard C Library

The C standard library is a standardized collection of header files and library routines used to implement common operations, such as input/output and string handling. It became part of the C++ Standard Library as the Standard C Library in its ANSI C 89 form with some small modifications to make it work better with the C++ language.

For a more in depth look into the C programming language check the C Programming Wikibook but be aware of the incompatibilities we have already covered on the Comparing C++ with C Section of this book.

All Standard C Library Functions

Functions Descriptions
abort stops the program
abs absolute value
acos arc cosine
asctime a textual version of the time
asin arc sine
assert stops the program if an expression isn't true
atan arc tangent
atan2 arc tangent, using signs to determine quadrants
atexit sets a function to be called when the program exits
atof converts a string to a double
atoi converts a string to an integer
atol converts a string to a long
bsearch perform a binary search
calloc allocates and clears a two-dimensional chunk of memory
ceil the smallest integer not less than a certain value
clearerr clears errors
clock returns the amount of time that the program has been running
cos cosine
cosh hyperbolic cosine
ctime returns a specifically formatted version of the time
difftime the difference between two times
div returns the quotient and remainder of a division
exit stop the program
exp returns "e" raised to a given power
fabs absolute value for floating-point numbers
fclose close a file
feof true if at the end-of-file
ferror checks for a file error
fflush writes the contents of the output buffer
fgetc get a character from a stream
fgetpos get the file position indicator
fgets get a string of characters from a stream
floor returns the largest integer not greater than a given value
fmod returns the remainder of a division
fopen open a file
fprintf print formatted output to a file
fputc write a character to a file
fputs write a string to a file
fread read from a file
free returns previously allocated memory to the operating system
freopen open an existing stream with a different name
frexp decomposes a number into scientific notation
fscanf read formatted input from a file
fseek move to a specific location in a file
fsetpos move to a specific location in a file
ftell returns the current file position indicator
fwrite write to a file
getc read a character from a file
getchar read a character from STDIN
getenv get environment information about a variable
gets read a string from STDIN
gmtime returns a pointer to the current Greenwich Mean Time
isalnum true if a character is alphanumeric
isalpha true if a character is alphabetic
iscntrl true if a character is a control character
isdigit true if a character is a digit
isgraph true if a character is a graphical character
islower true if a character is lowercase
isprint true if a character is a printing character
ispunct true if a character is punctuation
isspace true if a character is a space character
isupper true if a character is an uppercase character
isxdigit true if a character is a hexadecimal character
labs absolute value for long integers
ldexp computes a number in scientific notation
ldiv returns the quotient and remainder of a division, in long integer form
localtime returns a pointer to the current time
log natural logarithm
log10 natural logarithm, in base 10
longjmp start execution at a certain point in the program
malloc allocates memory
memchr searches an array for the first occurrence of a character
memcmp compares two buffers
memcpy copies one buffer to another
memmove moves one buffer to another
memset fills a buffer with a character
mktime returns the calendar version of a given time
modf decomposes a number into integer and fractional parts
perror displays a string version of the current error to STDERR
pow returns a given number raised to another number
printf write formatted output to STDOUT
putc write a character to a stream
putchar write a character to STDOUT
puts write a string to STDOUT
qsort perform a quicksort
raise send a signal to the program
rand returns a pseudo-random number
realloc changes the size of previously allocated memory
remove erase a file
rename rename a file
rewind move the file position indicator to the beginning of a file
scanf read formatted input from STDIN
setbuf set the buffer for a specific stream
setjmp set execution to start at a certain point
setlocale sets the current locale
setvbuf set the buffer and size for a specific stream
signal register a function as a signal handler
sin sine
sinh hyperbolic sine
sprintf write formatted output to a buffer
sqrt square root
srand initialize the random number generator
sscanf read formatted input from a buffer
strcat concatenates two strings
strchr finds the first occurrence of a character in a string
strcmp compares two strings
strcoll compares two strings in accordance to the current locale
strcpy copies one string to another
strcspn searches one string for any characters in another
strerror returns a text version of a given error code
strftime returns individual elements of the date and time
strlen returns the length of a given string
strncat concatenates a certain amount of characters of two strings
strncmp compares a certain amount of characters of two strings
strncpy copies a certain amount of characters from one string to another
strpbrk finds the first location of any character in one string, in another string
strrchr finds the last occurrence of a character in a string
strspn returns the length of a substring of characters of a string
strstr finds the first occurrence of a substring of characters
strtod converts a string to a double
strtok finds the next token in a string
strtol converts a string to a long
strtoul converts a string to an unsigned long
strxfrm converts a substring so that it can be used by string comparison functions
system perform a system call
tan tangent
tanh hyperbolic tangent
time returns the current calendar time of the system
tmpfile return a pointer to a temporary file
tmpnam return a unique filename
tolower converts a character to lowercase
toupper converts a character to uppercase
ungetc puts a character back into a stream
va_arg use variable length parameter lists
vprintf, vfprintf, and vsprintf write formatted output with variable argument lists
vscanf, vfscanf, and vsscanf read formatted input with variable argument lists

These routines included on the Standard C Library can be sub divided into:

Standard C I/O

The Standard C Library includes routines that are somewhat outdated, but due to the history of the C++ language and its objective to maintain compatibility these are included in the package.

C I/O calls still appear in old code (not only ANSI C 89 but even old C++ code). Its use today may depend on a large number of factors, the age of the code base or the level of complexity of the project or even based on the experience of the programmers. Why use something you are not familiar with if you are proficient in C and in some cases C-style I/O routines are superior to their C++ I/O counterparts, for instance they are more compact and may be are good enough for the simple projects that don't make use of classes.

NOTE:
If you're learning I/O for the first time you probably should program using the C++ I/O system and not bring legacy I/O systems into the mix. Learn C-style I/O only if you have to.

clearerr

Syntax
#include <cstdio>
void clearerr( FILE *stream );

The clearerr function resets the error flags and EOF indicator for the given stream. If an error occurs, you can use perror() or strerror() to figure out which error actually occurred, or read the error from the global variable errno.

Related topics
feof - ferror - perror - strerror

fclose

Syntax
#include <cstdio>
int fclose( FILE *stream );

The function fclose() closes the given file stream, deallocating any buffers associated with that stream. fclose() returns 0 upon success, and EOF otherwise.

Related topics
fflush - fopen - freopen - setbuf

feof

Syntax
#include <cstdio>
int feof( FILE *stream );

The function feof() returns TRUE if the end-of-file was reached, or FALSE otherwise.

Related topics
clearerr - ferror - getc - perror - putc

ferror

Syntax
#include <cstdio>
int ferror( FILE *stream );

The ferror() function looks for errors with stream, returning zero if no errors have occurred, and non-zero if there is an error. In case of an error, use perror() to determine which error has occurred.

Related topics
clearerr - feof - perror

fflush

Syntax
#include <cstdio>
int fflush( FILE *stream );

If the given file stream is an output stream, then fflush() causes the output buffer to be written to the file. If the given stream is of the input type, the behavior of fflush() depends on the library being used (for example, some libraries ignore the operation, others report an error, and others clear pending input).

fflush() is useful when either debugging (for example, if a program segfaults before the buffer is sent to the screen), or it can be used to ensure a partial display of output before a long processing period.

By default, most implementations have stdout transmit the buffer at the end of each line, while stderr is flushed whenever there is output. This behavior changes if there is a redirection or pipe, where calling fflush(stdout) can help maintain the flow of output.

printf( "Before first call\n" );
fflush( stdout );
shady_function();
printf( "Before second call\n" );
fflush( stdout );
dangerous_dereference();
Related topics
fclose - fopen - fread - fwrite - getc - putc

fgetc

Syntax
#include <cstdio>
int fgetc( FILE *stream );

The fgetc() function returns the next character from stream, or EOF if the end of file is reached or if there is an error.

Related topics
fopen - fputc - fread - fwrite - getc - getchar - gets - putc

fgetpos

Syntax
#include <cstdio>
int fgetpos( FILE *stream, fpos_t *position );

The fgetpos() function stores the file position indicator of the given file stream in the given position variable. The position variable is of type fpos_t (which is defined in cstdio) and is an object that can hold every possible position in a FILE. fgetpos() returns zero upon success, and a non-zero value upon failure.

Related topics
fseek - fsetpos - ftell

fgets

Syntax
#include <cstdio>
char *fgets( char *str, int num, FILE *stream );

The function fgets() reads up to num - 1 characters from the given file stream and dumps them into str. The string that fgets() produces is always NULL-terminated. fgets() will stop when it reaches the end of a line, in which case str will contain that newline character. Otherwise, fgets() will stop when it reaches num - 1 characters or encounters the EOF character. fgets() returns str on success, and NULL on an error.

Related topics
fputs - fscanf - gets - scanf

fopen

Syntax
#include <cstdio>
FILE *fopen( const char *fname, const char *mode );

The fopen() function opens a file indicated by fname and returns a stream associated with that file. If there is an error, fopen() returns NULL. mode is used to determine how the file will be treated (i.e. for input, output, etc)

The mode contains up to three characters. The first character is either "r", "w", or "a", which indicates how the file is opened. A file opened for reading starts allows input from the beginning of the file. For writing, the file is erased. For appending, the file is kept and writing to the file will start at the end. The second character is "b", is an optional flag that opens the file as binary - omitting any conversions from different formats of text. The third character "+" is an optional flag that allows read and write operations on the file (but the file itself is opened in the same way.

Mode Meaning Mode Meaning
"r" Open a text file for reading "r+" Open a text file for read/write
"w" Create a text file for writing "w+" Create a text file for read/write
"a" Append to a text file "a+" Open a text file for read/write
"rb" Open a binary file for reading "rb+" Open a binary file for read/write
"wb" Create a binary file for writing "wb+" Create a binary file for read/write
"ab" Append to a binary file "ab+" Open a binary file for read/write

An example:

int ch;
FILE *input = fopen( "stuff", "r" );
ch = getc( input );
Related topics
fclose - fflush - fgetc - fputc - fread - freopen - fseek - fwrite - getc - getchar - setbuf

fprintf

Syntax
#include <cstdio>
int fprintf( FILE *stream, const char *format, ... );

The fprintf() function sends information (the arguments) according to the specified format to the file indicated by stream. fprintf() works just like printf() as far as the format goes. The return value of fprintf() is the number of characters outputted, or a negative number if an error occurs. An example:

char name[20] = "Mary";
FILE *out;
out = fopen( "output.txt", "w" );
if( out != NULL )
  fprintf( out, "Hello %s\n", name );
Related topics
fputc - fputs - fscanf - printf - sprintf

fputc

Syntax
#include <cstdio>
int fputc( int ch, FILE *stream );

The function fputc() writes the given character ch to the given output stream. The return value is the character, unless there is an error, in which case the return value is EOF.

Related topics
fgetc - fopen - fprintf - fread - fwrite - getc - getchar - putc

fputs

Syntax
#include <cstdio>
int fputs( const char *str, FILE *stream );

The fputs() function writes an array of characters pointed to by str to the given output stream. The return value is non-negative on success, and EOF on failure.

Related topics
fgets - fprintf - fscanf - gets - getc - puts

fread

Syntax
#include <cstdio>
int fread( void *buffer, size_t size, size_t num, FILE *stream );

The function fread() reads num number of objects (where each object is size bytes) and places them into the array pointed to by buffer. The data comes from the given input stream. The return value of the function is the number of things read. You can use feof() or ferror() to figure out if an error occurs.

Related topics
fflush - fgetc - fopen - fputc - fscanf - fwrite - getc

freopen

Syntax
#include <cstdio>
FILE *freopen( const char *fname, const char *mode, FILE *stream );

The freopen() function is used to reassign an existing stream to a different file and mode. After a call to this function, the given file stream will refer to fname with access given by mode. The return value of freopen() is the new stream, or NULL if there is an error.

Related topics
fclose - fopen

fscanf

Syntax
#include <cstdio>
int fscanf( FILE *stream, const char *format, ... );

The function fscanf() reads data from the given file stream in a manner exactly like scanf(). The return value of fscanf() is the number of variables that are actually assigned values, including zero if there were no matches. EOF is returned if there was an error reading before the first match.

Related topics
fgets - fprintf - fputs - fread - fwrite - scanf - sscanf

fseek

Syntax
#include <cstdio>
int fseek( FILE *stream, long offset, int origin );

The function fseek() sets the file position data for the given stream. The origin value should have one of the following values (defined in cstdio):

Name Explanation
SEEK_SET Seek from the start of the file
SEEK_CUR Seek from the current location
SEEK_END Seek from the end of the file

fseek() returns zero upon success, non-zero on failure. You can use fseek() to move beyond a file, but not before the beginning. Using fseek() clears the EOF flag associated with that stream.

Related topics
fgetpos - fopen - fsetpos - ftell - rewind

fsetpos

Syntax
#include <cstdio>
int fsetpos( FILE *stream, const fpos_t *position );

The fsetpos() function moves the file position indicator for the given stream to a location specified by the position object. fpos_t is defined in cstdio. The return value for fsetpos() is zero upon success, non-zero on failure.

Related topics
fgetpos - fseek - ftell

ftell

Syntax
#include <cstdio>
long ftell( FILE *stream );

The ftell() function returns the current file position for stream, or -1 if an error occurs.

Related topics
fgetpos - fseek - fsetpos

fwrite

Syntax
#include <cstdio>
int fwrite( const void *buffer, size_t size, size_t count, FILE *stream );

The fwrite() function writes, from the array buffer, count objects of size size to stream. The return value is the number of objects written.

Related topics
fflush - fgetc - fopen - fputc - fread - fscanf - getc

getc

Syntax
#include <cstdio>
int getc( FILE *stream );

The getc() function returns the next character from stream, or EOF if the end of file is reached. getc() is identical to fgetc(). For example:

int ch;
FILE *input = fopen( "stuff", "r" );             
 
ch = getc( input );
while( ch != EOF ) {
  printf( "%c", ch );
  ch = getc( input );
}
Related topics
feof - fflush - fgetc - fopen - fputc - fgetc - fread - fwrite - putc - ungetc

getchar

Syntax
#include <cstdio>
int getchar( void );

The getchar() function returns the next character from stdin, or EOF if the end of file is reached.

Related topics
fgetc - fopen - fputc - putc

gets

Syntax
#include <cstdio>
char *gets( char *str );

The gets() function reads characters from stdin and loads them into str, until a newline or EOF is reached. The newline character is translated into a null termination. The return value of gets() is the read-in string, or NULL if there is an error.

NOTE:
gets() does not perform bounds checking, and thus risks overrunning str. For a similar (and safer) function that includes bounds checking, see fgets().

Related topics
fgetc - fgets - fputs - puts

perror

Syntax
#include <cstdio>
void perror( const char *str );

The perror() function prints str and an implementation-defined error message corresponding to the global variable errno. For example:

char* input_filename = "not_found.txt";
FILE* input = fopen( input_filename, "r" );
if( input == NULL ) {
  char error_msg[255];
  sprintf( error_msg, "Error opening file '%s'", input_filename );
  perror( error_msg );
  exit( -1 );
}

If the file called not_found.txt is not found, this code will produce the following output:

 Error opening file 'not_found.txt': No such file or directory
Related topics
clearerr - feof - ferror

printf

Syntax
#include <cstdio>
int printf( const char *format, ... );

The printf() function prints output to stdout, according to format and other arguments passed to printf(). The string format consists of two types of items - characters that will be printed to the screen, and format commands that define how the other arguments to printf() are displayed. Basically, you specify a format string that has text in it, as well as "special" characters that map to the other arguments of printf(). For example, this code

char name[20] = "Bob";
int age = 21;
printf( "Hello %s, you are %d years old\n", name, age );

displays the following output:

 Hello Bob, you are 21 years old

The %s means, "insert the first argument, a string, right here." The %d indicates that the second argument (an integer) should be placed there. There are different %-codes for different variable types, as well as options to limit the length of the variables and whatnot.

Control Character Explanation
 %c a single character
 %d a decimal integer
 %i an integer
 %e scientific notation, with a lowercase "e"
 %E scientific notation, with a uppercase "E"
 %f a floating-point number
 %g use %e or %f, whichever is shorter
 %G use %E or %f, whichever is shorter
 %o an octal number
 %x unsigned hexadecimal, with lowercase letters
 %X unsigned hexadecimal, with uppercase letters
 %u an unsigned integer
 %s a string
 %x a hexadecimal number
 %p a pointer
 %n the argument shall be a pointer to an integer into which is placed the number of characters written so far
 %% a percent sign

A field-length specifier may appear before the final control character to indicate the width of the field:

  • h, when inserted inside %d, causes the argument to be a short int.
  • l, when inserted inside %d, causes the argument to be a long.
  • l, when inserted inside %f, causes the argument to be a double.
  • L, when inserted inside %d or %f, causes the argument to be a long long or long double respecivly.

An integer placed between a % sign and the format command acts as a minimum field width specifier, and pads the output with spaces or zeros to make it long enough. If you want to pad with zeros, place a zero before the minimum field width specifier:

  %012d

You can also include a precision modifier, in the form of a .N where N is some number, before the format command:

 %012.4d

The precision modifier has different meanings depending on the format command being used:

  • With %e, %E, and %f, the precision modifier lets you specify the number of decimal places desired. For example, %12.6f will display a floating number at least 12 digits wide, with six decimal places.
  • With %g and %G, the precision modifier determines the maximum number of significant digits displayed.
  • With %s, the precision modifier simply acts as a maximum field length, to complement the minimum field length that precedes the period.

All of printf()'s output is right-justified, unless you place a minus sign right after the % sign. For example,

 %-12.4f 

will display a floating point number with a minimum of 12 characters, 4 decimal places, and left justified. You may modify the %d, %i, %o, %u, and %x type specifiers with the letter l and the letter h to specify long and short data types (e.g. %hd means a short integer). The %e, %f, and %g type specifiers can have the letter l before them to indicate that a double follows. The %g, %f, and %e type specifiers can be preceded with the character '#' to ensure that the decimal point will be present, even if there are no decimal digits. The use of the '#' character with the %x type specifier indicates that the hexidecimal number should be printed with the '0x' prefix. The use of the '#' character with the %o type specifier indicates that the octal value should be displayed with a 0 prefix.

Inserting a plus sign '+' into the type specifier will force positive values to be preceded by a '+' sign. Putting a space character ' ' there will force positive values to be preceded by a single space character.

You can also include constant escape sequences in the output string.

The return value of printf() is the number of characters printed, or a negative number if an error occurred.

Related topics
fprintf - puts - scanf - sprintf

putc

Syntax
#include <cstdio>
int putc( int ch, FILE *stream );

The putc() function writes the character ch to stream. The return value is the character written, or EOF if there is an error. For example:

int ch;
FILE *input, *output;
input = fopen( "tmp.c", "r" );
output = fopen( "tmpCopy.c", "w" );
ch = getc( input );
while( ch != EOF ) {
  putc( ch, output );
  ch = getc( input );
}
fclose( input );
fclose( output );

Generates a copy of the file tmp.c called tmpCopy.c.

Related topics
feof - fflush - fgetc - fputc - getc - getchar - putchar - puts

putchar

Syntax
#include <cstdio>
int putchar( int ch );

The putchar() function writes ch to stdout. The code

putchar( ch );

is the same as

putc( ch, stdout );

The return value of putchar() is the written character, or EOF if there is an error.

Related topics
putc

puts

Syntax
#include <cstdio>
int puts( char *str );

The function puts() writes str to stdout. puts() returns non-negative on success, or EOF on failure.

Related topics
fputs - gets - printf - putc

remove

Syntax
#include <cstdio>
int remove( const char *fname );

The remove() function erases the file specified by fname. The return value of remove() is zero upon success, and non-zero if there is an error.

Related topics
rename

rename

Syntax
#include <cstdio>
int rename( const char *oldfname, const char *newfname );

The function rename() changes the name of the file oldfname to newfname. The return value of rename() is zero upon success, non-zero on error.

Related topics
remove

rewind

Syntax
#include <cstdio>
void rewind( FILE *stream );

The function rewind() moves the file position indicator to the beginning of the specified stream, also clearing the error and EOF flags associated with that stream.

Related topics
fseek

scanf

Syntax
#include <cstdio>
int scanf( const char *format, ... );

The scanf() function reads input from stdin, according to the given format, and stores the data in the other arguments. It works a lot like printf(). The format string consists of control characters, whitespace characters, and non-whitespace characters. The control characters are preceded by a % sign, and are as follows:

Control Character Explanation
 %c a single character
 %d a decimal integer
 %i an integer
 %e, %f, %g a floating-point number
 %lf a double
 %o an octal number
 %s a string
 %x a hexadecimal number
 %p a pointer
 %n an integer equal to the number of characters read so far
 %u an unsigned integer
 %[] a set of characters
 %% a percent sign

scanf() reads the input, matching the characters from format. When a control character is read, it puts the value in the next variable. Whitespace (tabs, spaces, etc) are skipped. Non-whitespace characters are matched to the input, then discarded. If a number comes between the % sign and the control character, then only that many characters will be converted into the variable. If scanf() encounters a set of characters, denoted by the %[] control character, then any characters found within the brackets are read into the variable. The return value of scanf() is the number of variables that were successfully assigned values, or EOF if there is an error.

This code snippet uses scanf() to read an int, float, and a double from the user. Note that the variable arguments to scanf() are passed in by address, as denoted by the ampersand (&) preceding each variable:

int i;
float f;               
double d;
 
printf( "Enter an integer: " );
scanf( "%d", &i );             
 
printf( "Enter a float: " );
scanf( "%f", &f );             
 
printf( "Enter a double: " );
scanf( "%lf", &d );             
 
printf( "You entered %d, %f, and %f\n", i, f, d );
Related topics
fgets - fscanf - printf - sscanf

setbuf

Syntax
#include <cstdio>
void setbuf( FILE *stream, char *buffer );

The setbuf() function sets stream to use buffer, or, if buffer is NULL, turns off buffering. This function expects that the buffer be BUFSIZ characters long - since this function does not support specifying the size of the buffer, buffers larger than BUFSIZ will be partly unused.

Related topics
fclose - fopen - setvbuf

setvbuf

Syntax
#include <cstdio>
int setvbuf( FILE *stream, char *buffer, int mode, size_t size );

The function setvbuf() sets the buffer for stream to be buffer, with a size of size. mode can be one of:

  • _IOFBF, which indicates full buffering
  • _IOLBF, which means line buffering
  • _IONBF, which means no buffering
Related topics
fflush - setbuf

sprintf

Syntax
#include <cstdio>
int sprintf( char *buffer, const char *format, ... );

The sprintf() function is just like printf(), except that the output is sent to buffer. The return value is the number of characters written. For example:

char string[50];
int file_number = 0;         
 
sprintf( string, "file.%d", file_number );
file_number++;
output_file = fopen( string, "w" );

Note that sprintf() does the opposite of a function like atoi() -- where atoi() converts a string into a number, sprintf() can be used to convert a number into a string.

For example, the following code uses sprintf() to convert an integer into a string of characters:

char result[100];
int num = 24;
sprintf( result, "%d", num );

This code is similar, except that it converts a floating-point number into an array of characters:

char result[100];
float fnum = 3.14159;
sprintf( result, "%f", fnum );
Related topics
fprintf - printf
(Standard C String and Character) atof - atoi - atol

sscanf

Syntax
#include <cstdio>
int sscanf( const char *buffer, const char *format, ... );

The function sscanf() is just like scanf(), except that the input is read from buffer.

Related topics
fscanf - scanf

tmpfile

Syntax
#include <cstdio>
FILE *tmpfile( void );

The function tmpfile() opens a temporary file with an unique filename and returns a pointer to that file. If there is an error, null is returned.

Related topics
tmpnam

tmpnam

Syntax
#include <cstdio>
char *tmpnam( char *name );

The tmpnam() function creates an unique filename and stores it in name. tmpnam() can be called up to TMP_MAX times.

Related topics
tmpfile

ungetc

Syntax
#include <cstdio>
int ungetc( int ch, FILE *stream );

The function ungetc() puts the character ch back in stream.

Related topics
getc
(C++ I/O) putback

vprintf, vfprintf, and vsprintf

Syntax
#include <cstdarg>
#include <cstdio>
int vprintf( char *format, va_list arg_ptr );
int vfprintf( FILE *stream, const char *format, va_list arg_ptr );
int vsprintf( char *buffer, char *format, va_list arg_ptr );

These functions are very much like printf(), fprintf(), and sprintf(). The difference is that the argument list is a pointer to a list of arguments. va_list is defined in cstdarg, and is also used by (Other Standard C Functions) va_arg().

For example:

void error( char *fmt, ... ) {
  va_list args;
  va_start( args, fmt );
  fprintf( stderr, "Error: " );
  vfprintf( stderr, fmt, args );
  fprintf( stderr, "\n" );
  va_end( args );
  exit( 1 );
}

Standard C String & Character

The Standard C Library includes also routines that deals with characters and strings. You must keep in mind that in C, a string of characters is stored in successive elements of a character array and terminated by the NULL character.

/* "Hello" is stored in a character array */
char note[SIZE];
note[0] = 'H'; note[1] = 'e'; note[2] = 'l'; note[3] = 'l'; note[4] = 'o'; note[5] = '\0';

Even if outdated this C string and character functions still appear in old code and more so than the previous I/O functions.

atof
Syntax
#include <cstdlib>
double atof( const char *str );

The function atof() converts str into a double, then returns that value. str must start with a valid number, but can be terminated with any non-numerical character, other than "E" or "e". For example,

x = atof( "42.0is_the_answer" );

results in x being set to 42.0.

Related topics
atoi - atol - strtod
(Standard C I/O) sprintf
atoi
Syntax
#include <cstdlib>
int atoi( const char *str );

The atoi() function converts str into an integer, and returns that integer. str should start with a whitespace or some sort of number, and atoi() will stop reading from str as soon as a non-numerical character has been read. For example:

int i;
i = atoi( "512" );
i = atoi( "512.035" );
i = atoi( "   512.035" );
i = atoi( "   512+34" );
i = atoi( "   512 bottles of beer on the wall" );

All five of the above assignments to the variable i would result in it being set to 512.

If the conversion cannot be performed, then atoi() will return zero:

int i = atoi( " does not work: 512" );  // results in i == 0
Related topics
atof - atol
(Standard C I/O) sprintf
atol
Syntax
#include <cstdlib>
long atol( const char *str );

The function atol() converts str into a long, then returns that value. atol() will read from str until it finds any character that should not be in a long. The resulting truncated value is then converted and returned. For example,

x = atol( "1024.0001" );

results in x being set to 1024L.

Related topics
atof - atoi - strtod
(Standard C I/O) sprintf

isalnum

Syntax
#include <cctype>
int isalnum( int ch );

The function isalnum() returns non-zero if its argument is a numeric digit or a letter of the alphabet. Otherwise, zero is returned.

char c;
scanf( "%c", &c );
if( isalnum(c) )
  printf( "You entered the alphanumeric character %c\n", c );
Related topics
isalpha - iscntrl - isdigit - isgraph - isprint - ispunct - isspace - isxdigit

isalpha

Syntax
#include <cctype>
int isalpha( int ch );

The function isalpha() returns non-zero if its argument is a letter of the alphabet. Otherwise, zero is returned.

char c;
scanf( "%c", &c );
if( isalpha(c) )
  printf( "You entered a letter of the alphabet\n" );
Related topics
isalnum - iscntrl - isdigit - isgraph - isprint - ispunct - isspace - isxdigit

iscntrl

Syntax
#include <cctype>
int iscntrl( int ch );

The iscntrl() function returns non-zero if its argument is a control character (between 0 and 0x1F or equal to 0x7F). Otherwise, zero is returned.

Related topics
isalnum - isalpha - isdigit - isgraph - isprint - ispunct - isspace - isxdigit

isdigit

Syntax
#include <cctype>
int isdigit( int ch );

The function isdigit() returns non-zero if its argument is a digit between 0 and 9. Otherwise, zero is returned.

char c;
scanf( "%c", &c );
if( isdigit(c) )
  printf( "You entered the digit %c\n", c );
Related topics
isalnum - isalpha - iscntrl - isgraph - isprint - ispunct - isspace - isxdigit

isgraph

Syntax
#include <cctype>
int isgraph( int ch );

The function isgraph() returns non-zero if its argument is any printable character other than a space (if you can see the character, then isgraph() will return a non-zero value). Otherwise, zero is returned.

Related topics
isalnum - isalpha - iscntrl - isdigit - isprint - ispunct - isspace - isxdigit

islower

Syntax
#include <cctype>
int islower( int ch );

The islower() function returns non-zero if its argument is a lowercase letter. Otherwise, zero is returned.

Related topics
isupper

isprint

Syntax
#include <cctype>
int isprint( int ch );

The function isprint() returns non-zero if its argument is a printable character (including a space). Otherwise, zero is returned.

Related topics
isalnum - isalpha - iscntrl - isdigit - isgraph - ispunct - isspace

ispunct

Syntax
#include <cctype>
int ispunct( int ch );

The ispunct() function returns non-zero if its argument is a printing character but neither alphanumeric nor a space. Otherwise, zero is returned.

Related topics
isalnum - isalpha - iscntrl - isdigit - isgraph - isspace - isxdigit

isspace

Syntax
#include <cctype>
int isspace( int ch );

The isspace() function returns non-zero if its argument is some sort of space (i.e. single space, tab, vertical tab, form feed, carriage return, or newline). Otherwise, zero is returned.

Related topics
isalnum - isalpha - iscntrl - isdigit - isgraph - isprint - ispunct - isxdigit

isupper

Syntax
#include <cctype>
int isupper( int ch );

The isupper() function returns non-zero if its argument is an uppercase letter. Otherwise, zero is returned.

Related topics
islower - tolower

isxdigit

Syntax
#include <cctype>
int isxdigit( int ch );

The function isxdigit() returns non-zero if its argument is a hexadecimal digit (i.e. A-F, a-f, or 0-9). Otherwise, zero is returned.

Related topics
isalnum - isalpha - iscntrl - isdigit - isgraph - ispunct - isspace

memchr

Syntax
#include <cstring>
void *memchr( const void *buffer, int ch, size_t count );

The memchr() function looks for the first occurrence of ch within count characters in the array pointed to by buffer. The return value points to the location of the first occurrence of ch, or NULL if ch isn't found. For example:

char names[] = "Alan Bob Chris X Dave";
if( memchr(names,'X',strlen(names)) == NULL )
  printf( "Didn't find an X\n" );
else
  printf( "Found an X\n" );
Related topics
memcmp - memcpy - strstr

memcmp

Syntax
#include <cstring>
int memcmp( const void *buffer1, const void *buffer2, size_t count );

The function memcmp() compares the first count characters of buffer1 and buffer2. The return values are as follows:

Return value Explanation
less than 0 buffer1 is less than buffer2
equal to 0 buffer1 is equal to buffer2
greater than 0 buffer1 is greater than buffer2
Related topics
memchr - memcpy - memset - strcmp

memcpy

Syntax
#include <cstring>
void *memcpy( void *to, const void *from, size_t count );

The function memcpy() copies count characters from the array from to the array to. The return value of memcpy() is to. The behavior of memcpy() is undefined if to and from overlap.

Related topics
memchr - memcmp - memmove - memset - strcpy - strlen - strncpy

memmove

Syntax
#include <cstring>
void *memmove( void *to, const void *from, size_t count );

The memmove() function is identical to memcpy(), except that it works even if to and from overlap.

Related topics
memcpy - memset

memset

Syntax
#include <cstring>
void* memset( void* buffer, int ch, size_t count );

The function memset() copies ch into the first count characters of buffer, and returns buffer. memset() is useful for intializing a section of memory to some value. For example, this command:

const int ARRAY_LENGTH;
char the_array[ARRAY_LENGTH];
...
// zero out the contents of the_array
memset( the_array, '\0', ARRAY_LENGTH );

...is a very efficient way to set all values of the_array to zero.

The table below compares two different methods for initializing an array of characters: a for-loop versus memset(). As the size of the data being initialized increases, memset() clearly gets the job done much more quickly:

Input size Initialized with a for-loop Initialized with memset()
1000 0.016 0.017
10000 0.055 0.013
100000 0.443 0.029
1000000 4.337 0.291
Related topics
memcmp - memcpy - memmove

strcat

Syntax
#include <cstring>
char *strcat( char *str1, const char *str2 );

The strcat() function concatenates str2 onto the end of str1, and returns str1. For example:

printf( "Enter your name: " );
scanf( "%s", name );
title = strcat( name, " the Great" );
printf( "Hello, %s\n", title );  ;

Note that strcat() does not perform bounds checking, and thus risks overrunning str1 or str2. For a similar (and safer) function that includes bounds checking, see strncat().

Related topics
strchr - strcmp - strcpy - strncat

strchr

Syntax
#include <cstring>
char *strchr( const char *str, int ch );

The function strchr() returns a pointer to the first occurrence of ch in str, or NULL if ch is not found.

Related topics
strcat - strcmp - strcpy - strlen - strncat - strncmp - strncpy - strpbrk - strrchr -strspn - strstr - strtok

strcmp

Syntax
#include <cstring>
int strcmp( const char *str1, const char *str2 );

The function strcmp() compares str1 and str2, then returns:

Return value Explanation
less than 0 str1 is less than str2
equal to 0 str1 is equal to str2
greater than 0 str1 is greater than str2

For example:

printf( "Enter your name: " );
scanf( "%s", name );
if( strcmp( name, "Mary" ) == 0 ) {
  printf( "Hello, Dr. Mary!\n" );
}

Note that if str1 or str2 are missing a null-termination character, then strcmp() may not produce valid results. For a similar (and safer) function that includes explicit bounds checking, see strncmp().

Related topics
memcmp - strcat - strchr - strcoll - strcpy - strlen - strncmp - strxfrm

strcoll

Syntax
#include <cstring>
int strcoll( const char *str1, const char *str2 );

The strcoll() function compares str1 and str2, much like strcmp(). However, strcoll() performs the comparison using the locale specified by the (Standard C Date & Time) setlocale() function.

Related topics
strcmp - strxfrm
(Standard C Date & Time) setlocale

strcpy

Syntax
#include <cstring>
char *strcpy( char *to, const char *from );

The strcpy() function copies characters in the string 'from to the string to, including the null termination. The return value is to.

Note that strcpy() does not perform bounds checking, and thus risks overrunning from or to. For a similar (and safer) function that includes bounds checking, see strncpy().

Related topics
memcpy - strcat - strchr - strcmp - strncmp - strncpy

strcspn

Syntax
#include <cstring>
size_t strcspn( const char *str1, const char *str2 );

The function strcspn() returns the index of the first character in str1 that matches any of the characters in str2.

Related topics
strpbrk - strrchr - strstr - strtok

strerror

Syntax
#include <cstring>
char *strerror( int num );

The function strerror() returns an implementation defined string corresponding to num. If an error occurred, the error is located within the global variable errno.

Related topics
perror

strlen

Syntax
#include <cstring>
size_t strlen( char *str );

The strlen() function returns the length of str (determined by the number of characters before null termination).

Related topics
memcpy - strchr - strcmp - strncmp

strncat

Syntax
#include <cstring>
char *strncat( char *str1, const char *str2, size_t count );

The function strncat() concatenates at most count characters of str2 onto str1, adding a null termination. The resulting string is returned.

Related topics
strcat - strchr - strncmp - strncpy

strncmp

Syntax
#include <cstring>
int strncmp( const char *str1, const char *str2, size_t count );

The strncmp() function compares at most count characters of str1 and str2. The return value is as follows:

Return value Explanation
less than 0 str1 is less than str2
equal to 0 str1 is equal to str2
greater than 0 str1 is greater than str2

If there are less than count characters in either string, then the comparison will stop after the first null termination is encountered.

Related topics
strchr - strcmp - strcpy - strlen - strncat - strncpy

strncpy

Syntax
#include <cstring>
char *strncpy( char *to, const char *from, size_t count );

The strncpy() function copies at most count characters of from to the string to. Only if from has less than count characters, is the remainder padded with '\0' characters. The return value is the resulting string.

NOTE:

Using strings not padded with the '\0' character can create security vulnerabilities.
Related topics
memcpy - strchr - strcpy - strncat - strncmp

strpbrk

Syntax
#include <cstring>
char * strpbrk( const char *str, const char *ch );

The function strchr() returns a pointer to the first occurrence of any character within ch in str, or NULL if no characters were not found.

Related topics
strchr - strrchr - strstr

strrchr

Syntax
#include <cstring>
char *strrchr( const char *str, int ch );

The function strrchr() returns a pointer to the last occurrence of ch in str, or NULL if no match is found.

Related topics
strchr - strcspn - strpbrk - strspn - strstr - strtok

strspn

Syntax
#include <cstring>
size_t strspn( const char *str1, const char *str2 );

The strspn() function returns the index of the first character in str1 that doesn't match any character in str2.

Related topics
strchr - strpbrk - strrchr - strstr - strtok

strstr

Syntax
#include <cstring>
char *strstr( const char *str1, const char *str2 );

The function strstr() returns a pointer to the first occurrence of str2 in str1, or NULL if no match is found. If the length of str2 is zero, then strstr() will simply return str1.

For example, the following code checks for the existence of one string within another string:

char* str1 = "this is a string of characters";
char* str2 = "a string";
char* result = strstr( str1, str2 );
if( result == NULL ) printf( "Could not find '%s' in '%s'\n", str2, str1 );
  else printf( "Found a substring: '%s'\n", result );

When run, the above code displays this output:

 Found a substring: 'a string of characters'
Related topics
memchr - strchr - strcspn - strpbrk - strrchr - strspn - strtok

strtod

Syntax
#include <cstdlib>
double strtod( const char *start, char **end );

The function strtod() returns whatever it encounters first in start as a double. end is set to point at whatever is left in start after that double. If overflow occurs, strtod() returns either HUGE_VAL or -HUGE_VAL.

x = atof( "42.0is_the_answer" );

results in x being set to 42.0.

Related topics
atof

strtok

Syntax
#include <cstring>
char *strtok( char *str1, const char *str2 );

The strtok() function returns a pointer to the next "token" in str1, where str2 contains the delimiters that determine the token. strtok() returns NULL if no token is found. In order to convert a string to tokens, the first call to strtok() should have str1 point to the string to be tokenized. All calls after this should have str1 be NULL.

For example:

char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "#";
char *result = NULL;
result = strtok( str, delims );
while( result != NULL ) {
  printf( "result is \"%s\"\n", result );
  result = strtok( NULL, delims );
}

The above code will display the following output:

 result is "now "
 result is " is the time for all "
 result is " good men to come to the "
 result is " aid of their country" 
Related topics
strchr - strcspn - strpbrk - strrchr - strspn - strstr

strtol

Syntax
#include <cstdlib>
long strtol( const char *start, char **end, int base );

The strtol() function returns whatever it encounters first in start as a long, doing the conversion to base if necessary. end is set to point to whatever is left in start after the long. If the result can not be represented by a long, then strtol() returns either LONG_MAX or LONG_MIN. Zero is returned upon error.

Related topics
atol - strtoul

strtoul

Syntax
#include <cstdlib>
unsigned long strtoul( const char *start, char **end, int base );

The function strtoul() behaves exactly like strtol(), except that it returns an unsigned long rather than a mere long.

Related topics
strtol

strxfrm

Syntax
#include <cstring>
size_t strxfrm( char *str1, const char *str2, size_t num );

The strxfrm() function manipulates the first num characters of str2 and stores them in str1. The result is such that if a strcoll() is performed on str1 and the old str2, you will get the same result as with a strcmp().

Related topics
strcmp - strcoll

tolower

Syntax
#include <cctype>
int tolower( int ch );

The function tolower() returns the lowercase version of the character ch.

Related topics
isupper - toupper

toupper

Syntax
#include <cctype>
int toupper( int ch );

The toupper() function returns the uppercase version of the character ch.

Related topics
tolower

Standard C Math

This section will cover the Math elements of the C Standard Library.

abs
Syntax
#include <cstdlib>
int abs( int num );

The abs() function returns the absolute value of num. For example:

int magic_number = 10;
cout << "Enter a guess: ";
cin >> x;
cout << "Your guess was " << abs( magic_number - x ) << " away from the magic number." << endl;
Related topics
fabs - labs
acos
Syntax
#include <cmath>
double acos( double arg );

The acos() function returns the arc cosine of arg, which will be in the range [0, pi]. arg should be between -1 and 1. If arg is outside this range, acos() returns NAN and raises a floating-point exception.

Related topics
asin - atan - atan2 - cos - cosh - sin - sinh - tan - tanh
asin
Syntax
#include <cmath>
double asin( double arg );

The asin() function returns the arc sine of arg, which will be in the range [-pi/2, +pi/2]. arg should be between -1 and 1. If arg is outside this range, asin() returns NAN and raises a floating-point exception.

Related topics
acos - atan - atan2 - cos - cosh - sin - sinh - tan - tanh
atan
Syntax
#include <cmath>
double atan( double arg );

The function atan() returns the arc tangent of arg, which will be in the range [-pi/2, +pi/2].

Related topics
acos - asin - atan2 - cos - cosh - sin - sinh - tan - tanh
atan2
Syntax
#include <cmath>
double atan2( double y, double x );

The atan2() function computes the arc tangent of y/x, using the signs of the arguments to compute the quadrant of the return value.

Related topics
acos - asin - atan - cos - cosh - sin - sinh - tan - tanh

ceil

Syntax
#include <cmath>
double ceil( double num );

The ceil() function returns the smallest integer no less than num. For example:

y = 6.04;
x = ceil( y );

would set x to 7.0.

Related topics
floor - fmod

cos

Syntax
#include <cmath>
double cos( double arg );

The cos() function returns the cosine of arg, where arg is expressed in radians. The return value of cos() is in the range [-1,1]. If arg is infinite, cos() will return NAN and raise a floating-point exception.

Related topics
acos - asin - atan - atan2 - cosh - sin - sinh - tan - tanh

cosh

Syntax
#include <cmath>
double cosh( double arg );

The function cosh() returns the hyperbolic cosine of arg.

Related topics
acos - asin - atan - atan2 - cos - sin - sinh - tan - tanh

div

Syntax
#include <cstdlib>
div_t div( int numerator, int denominator );

The function div() returns the quotient and remainder of the operation numerator / denominator. The div_t structure is defined in cstdlib, and has at least:

int quot;   // The quotient
int rem;    // The remainder

For example, the following code displays the quotient and remainder of x/y:

div_t temp;
temp = div( x, y );
printf( "%d divided by %d yields %d with a remainder of %d\n",
  x, y, temp.quot, temp.rem );
Related topics
ldiv

exp

Syntax
#include <cmath>
double exp( double arg );

The exp() function returns e (2.7182818) raised to the argth power.

Related topics
log - pow - sqrt

fabs

Syntax
#include <cmath>
double fabs( double arg );

The function fabs() returns the absolute value of arg.

Related topics
abs - fmod - labs

floor

Syntax
#include <cmath>
double floor( double arg );

The function floor() returns the largest integer not greater than arg. For example:

y = 6.04;
x = floor( y );

would result in x being set to 6.0.

Related topics
ceil - fmod

fmod

Syntax
#include <cmath>
double fmod( double x, double y );

The fmod() function returns the remainder of x/y.

Related topics
ceil - fabs - floor

frexp

Syntax
#include <cmath>
double frexp( double num, int* exp );

The function frexp() is used to decompose num into two parts: a mantissa between 0.5 and 1 (returned by the function) and an exponent returned as exp. Scientific notation works like this:

num = mantissa * (2 ^ exp)
Related topics
ldexp - modf

labs

Syntax
#include <cstdlib>
long labs( long num );

The function labs() returns the absolute value of num.

Related topics
abs - fabs

ldexp

Syntax
#include <cmath>
double ldexp( double num, int exp );

The ldexp() function returns num * (2 ^ exp). And get this: if an overflow occurs, HUGE_VAL is returned.

Related topics
frexp - modf

ldiv

Syntax
#include <cstdlib>
ldiv_t ldiv( long numerator, long denominator );

Testing: adiv_t, div_t, ldiv_t.

The ldiv() function returns the quotient and remainder of the operation numerator / denominator. The ldiv_t structure is defined in cstdlib and has at least:

long quot;  // the quotient
long rem;   // the remainder
Related topics
div

log

Syntax
#include <cmath>
double log( double num );

The function log() returns the natural (base e) logarithm of num. There's a domain error if num is negative, a range error if num is zero.

In order to calculate the logarithm of x to an arbitrary base b, you can use:

double answer = log(x) / log(b);
Related topics
exp - log10 - pow - sqrt

log10

Syntax
#include <cmath>
double log10( double num );

The log10() function returns the base 10 (or common) logarithm for num'. There's a domain error if num is negative, a range error if num is zero.

Related topics
log

modf

Syntax
#include <cmath>
double modf( double num, double *i );

The function modf() splits num into its integer and fraction parts. It returns the fractional part and loads the integer part into i.

Related topics
frexp - ldexp

pow

Syntax
#include <cmath>
double pow( double base, double exp );

The pow() function returns base raised to the expth power. There's a domain error if base is zero and exp is less than or equal to zero. There's also a domain error if base is negative and exp is not an integer. There's a range error if an overflow occurs.

Related topics
exp - log - sqrt

sin

Syntax
#include <cmath>
double sin( double arg );

The function sin() returns the sine of arg, where arg is given in radians. The return value of sin() will be in the range [-1,1]. If arg is infinite, sin() will return NAN and raise a floating-point exception.

Related topics
acos - asin - atan - atan2 - cos - cosh - sinh - tan - tanh

sinh

Syntax
#include <cmath>
double sinh( double arg );

The function sinh() returns the hyperbolic sine of arg.

Related topics
acos - asin - atan - atan2 - cos - cosh - sin - tan - tanh

sqrt

Syntax
#include <cmath>
double sqrt( double num );

The sqrt() function returns the square root of num. If num is negative, a domain error occurs.

Related topics
exp - log - pow

tan

Syntax
#include <cmath>
double tan( double arg );

The tan() function returns the tangent of arg, where arg is given in radians. If arg is infinite, tan() will return NAN and raise a floating-point exception.

Related topics
acos - asin - atan - atan2 - cos - cosh - sin - sinh - tanh

tanh

Syntax
#include <cmath>
double tanh( double arg );

The function tanh() returns the hyperbolic tangent of arg.

Related topics
acos - asin - atan - atan2 - cos - cosh - sin - sinh - tan

Standard C Time & Date

This section will cover the Time and Date elements of the C Standard Library.

asctime
Syntax
#include <ctime>
char *asctime( const struct tm *ptr );

The function asctime() converts the time in the struct 'ptr' to a character string of the following format:

day month date hours:minutes:seconds year

An example:

Mon Jun 26 12:03:53 2000
Related topics
clock - ctime - difftime - gmtime - localtime - mktime - time

clock

Syntax
#include <ctime>
clock_t clock( void );

The clock() function returns the processor time since the program started, or -1 if that information is unavailable. To convert the return value to seconds, divide it by CLOCKS_PER_SEC.

NOTE:
If your compiler and library is POSIX compliant, then CLOCKS_PER_SEC is always defined as 1000000.

Related topics
asctime - ctime - time

ctime

Syntax
#include <ctime>
char *ctime( const time_t *time );

The ctime() function converts the calendar time time to local time of the format:

day month date hours:minutes:seconds year            

using ctime() is equivalent to

asctime( localtime( tp ) );
Related topics
asctime - clock - gmtime - localtime - mktime - time

difftime

Syntax
#include <ctime>
double difftime( time_t time2, time_t time1 );

The function difftime() returns time2 - time1, in seconds.

Related topics
asctime - gmtime - localtime - time

gmtime

Syntax
#include <ctime>
struct tm *gmtime( const time_t *time );

The gmtime() function returns the given time in Coordinated Universal Time (usually Greenwich mean time), unless it's not supported by the system, in which case NULL is returned. Watch out for the static return.

Related topics
asctime - ctime - difftime - localtime - mktime - strftime - time

localtime

Syntax
#include <ctime>
struct tm *localtime( const time_t *time );

The function localtime() converts calendar time time into local time. Watch out for the static return.

Related topics
asctime - ctime - difftime - gmtime - strftime - time

mktime

Syntax
#include <ctime>
time_t mktime( struct tm *time );

The mktime() function converts the local time in time to calendar time, and returns it. If there is an error, -1 is returned.

Related topics
asctime - ctime - gmtime - time

setlocale

Syntax
#include <clocale>
char *setlocale( int category, const char * locale );

The setlocale() function is used to set and retrieve the current locale. If locale is NULL, the current locale is returned. Otherwise, locale is used to set the locale for the given category.

category can have the following values:

Value Description
LC_ALL All of the locale
LC_TIME Date and time formatting
LC_NUMERIC Number formatting
LC_COLLATE String collation and regular expression matching
LC_CTYPE Regular expression matching, conversion, case-sensitive comparison, wide character functions, and character classification.
LC_MONETARY For monetary formatting
LC_MESSAGES For natural language messages
Related topics
(Standard C String & Character) strcoll

strftime

Syntax
#include <ctime>
size_t strftime( char *str, size_t maxsize, const char *fmt, struct tm *time );

The function strftime() formats date and time information from time to a format specified by fmt, then stores the result in str (up to maxsize characters). Certain codes may be used in fmt to specify different types of time:

Code Meaning
 %a abbreviated weekday name (e.g. Fri)
 %A full weekday name (e.g. Friday)
 %b abbreviated month name (e.g. Oct)
 %B full month name (e.g. October)
 %c the standard date and time string
 %d day of the month, as a number (1-31)
 %H hour, 24 hour format (0-23)
 %I hour, 12 hour format (1-12)
 %j day of the year, as a number (1-366)
 %m month as a number (1-12).
 %M minute as a number (0-59)
 %p locale's equivalent of AM or PM
 %S second as a number (0-59)
 %U week of the year, (0-53), where week 1 has the first Sunday
 %w weekday as a decimal (0-6), where Sunday is 0