Jump to content

C Programming/Preliminaries

From Wikibooks, open books for an open world

Before learning C syntax and programming constructs, it is important to learn the meaning of a few key terms that are central in understanding C, as well as the basic structure of a C program.

So what is in a block? Generally, a block consists of executable statements.

Statements

[edit | edit source]

A statement can be thought of as text describing a single action the compiler will turn into executable instructions. For example:

puts("Hello, World!");

This statement calls puts, a function that prints text to the screen, and passes it the string "Hello, World!". You might have noticed the semicolon at the end of the statement. Statements in C always end with a semicolon (;). Leaving off the semicolon is a common mistake many people make, beginners and experts alike! So until it becomes second nature, be sure to double check your statements!

Since C is a "free-format" language, several statements can share a single line in the source file, like this:

puts("Hello!"); puts("Welcome to my program!"); puts("I hope you enjoy it.");

There are several kinds of statements. You've already seen some of them, such as the function call. A substantial portion of this book deals with statement construction.

Comments

[edit | edit source]

A comment is some explanatory text in your code that is ignored by the compiler. Comments in C come in two forms:

// Single-line comments (also called C++-style comments)

and

/* Multi-line
comments
(only form of comments supported prior to C99) */

Comments can be used for any purpose, though they are most commonly used to explain why a section of code does something in the way that it does, if the reason the code did it that way wouldn't be obvious to the reader.

Whitespace

[edit | edit source]

Whitespace refers to the tab, space and newline characters that separate the text characters that make up the source code.
Like many things in life, it's hard to appreciate whitespace until it's gone. To a C compiler, the source code

  puts("Hello world"); return 0;

is the same as

puts("Hello world");
    return 0;

which is also the same as

    puts (
 "Hello world") ;



      return   0;

The compiler simply ignores most whitespace. However, it is common practice to use spaces (or tabs) to organize source code for human readability.

Blocks

[edit | edit source]

Back to our discussion of blocks. In C, blocks contain statements, which will be run in order. Blocks begin with an opening brace { and end with a closing brace }. Blocks can contain other blocks which can contain their own blocks, and so on.

Let's look at a block example.

Example
int main(void)
{
    /* this is a 'block' */
    puts("Hello from inside the function");

    {
        /* this is also a 'block', nested inside the outer block */
        puts("Hello from a nested block");
    }

    return 0;
}

Outputs:

Hello from inside the function
Hello from a nested block

In this example, while the contents of main are a block, we create another block nested inside that block. Creating blocks on their own is not very useful in most cases. However, most features of C that you will learn—functions, decision-making, looping, and more—use blocks to delineate which statements should be included.

Literals

[edit | edit source]

A literal is a value that is hardcoded into your program. It may be a number, a string, a single character, or another type of data.

Let's take a look at two lines from our program:

    puts("Hello, World!");
    return 0;

We used two literals here: the string "Hello, World!" on line 5, and the integer 0 on line 6. These have two different types, which we'll cover next.

Basic data types

[edit | edit source]

All data that your program deals with exists as bytes. Each piece of data has a type, which is how those bytes should be interpreted.

C is a statically-typed programming language, which means every value—literal, variable, or any other expression—has an associated type, and the type of some data must agree with the type of the variable it is to be stored in. If the types aren't compatible, the program will fail to compile. For example, a variable meant for storing characters can't store integers.

C has a few kinds of basic, or primitive, data types.

Whole numbers

[edit | edit source]

The int type stores whole numbers, or integers. Examples of literals that are whole numbers (integers) are 1, 2, 3, 10, 100...

On most computers, int is 32 bits (4 bytes) wide, which can store any whole number (integer) between -2,147,483,648 and 2,147,483,647. This has the possibility of representing any one number out of 4,294,967,296 possibilities ().

Variants of int

[edit | edit source]

long int and short int are modifiers that make it possible for an integer data type to use either more or less memory. On most computers, a short takes 16 bits (2 bytes) with a range of -32,768 to 32,767. Whether or not long differs from int depends on the computer, but long long is usually 64 bits (8 bytes) wide with a range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

In all of the types described above, one bit is used to indicate the sign (positive or negative) of a value. If a value will never be negative, it can be unsigned, using that one bit for storing other data, effectively doubling the range of values while mandating that those values be positive.

Characters

[edit | edit source]

The char type stores characters, which are letters, numbers, spaces, punctuation, or other symbols in a certain character set. On most computers, char is 8 bits (1 byte) wide, which can distinguish between 256 values. This is enough to support a single, simple character set, such as the ASCII character set for English, but not enough to represent characters in a writing system with more than 256 characters, such as Chinese, and also not enough to support multiple writing systems simultaneously.

Examples of character literals are 'a', 'b', and '1', as well as some special characters such as '\n', which represents a new line. Note that the value must be enclosed within single quotations. One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' (decimal value 49 in binary) is not equal to the interger 1.

Limitations

[edit | edit source]

C was designed before multi-byte international text was widely supported in computers. C was also designed when it was reasonable to assume that a single byte would always be equivalent with a single character, but UTF-8, the most common text encoding in modern times, is a variable-width encoding, which means that different characters might take up different numbers of bytes in the same piece of text. In C, there are two solutions for this problem:

  • Use a wider, fixed-width character type. C provides wchar_t, char16_t, and char32_t, but only the last one is guaranteed to have enough space for each character.
  • Manually process the string in a variable-width way.

Both of these are advanced topics and are not covered here. Throughout the book, we'll use the standard char despite these shortcomings. This is because most of the C standard library is designed with these historical constraints in mind.

String literals

[edit | edit source]

There is another literal related to characters: the string literal. A string is a series of characters, usually intended to be displayed. They are surrounded by double quotations (" ", not ' '). An example of a string literal is the "Hello, World!" in the "Hello, World" example.

Note Strings are not their own type; they are arrays of char. Arrays are described later.

Floating-point numbers

[edit | edit source]

double is short for double-precision floating-point number. On most computers, it is 64 bits (8 bytes) wide. It stores inexact representations of real numbers, both integer and non-integer values. It can be used with numbers that are much greater or lesser than integers, ranging in magnitude from to . Literals have a decimal point, like 1.5, -3.7, and 2.0.

Floating-point numbers are inexact. Some numbers like 0.1 cannot be represented exactly but will have a small error. Very large and very small numbers will have less precision, and arithmetic operations are sometimes not associative or distributive because of a lack of precision. Nonetheless, floating-point numbers are most commonly used for approximating real numbers, and operations on them are efficient on modern computers.

Smaller, single-precision floating-point numbers are available as float. On most computers, it is 32 bits (4 bytes) wide. It is similar to double, but has a smaller range and less precision. float literals must be suffixed with F or f. Examples are: 3.1415926f, 4.0f, 6.022e+23f.

Note There is also long double, which might have better range and precision than double, but whether this is the case, and to which degree, depends on the computer.

Booleans

[edit | edit source]

Booleans, declared with bool, are binary, meaning they are only ever one of two values: true or false.

Note This type also goes by another name: _Bool. Before C23, programmers had to put #include <stdbool.h> at the beginnings of their files if they wanted to use bool, true, and false in place of _Bool, 1, and 0.

Basics of using functions

[edit | edit source]

Functions are a big part of programming. A function is a special kind of block that performs a well-defined task. If a function is well-designed, it can enable a programmer to perform a task without knowing anything about how the function works. The act of requesting a function to perform its task is called a function call. Many functions require a function call to hand it certain pieces of data needed to perform its task; these are called arguments. Many functions also return a value to the function call when they're finished; this is called a return value (the return value in the above program is 0).

The things you need to know before calling a function are:

  • What the function does
  • The data type (discussed later) of the arguments and what they mean
  • The data type of the return value and what it means

Many functions use the return value for the result of a computation. Some functions use the return value to indicate whether they successfully completed their work. As you have seen in the intro exercise, the main function uses a return value to provide an exit status to the operating system.

All code other than global data definitions and declarations needs to be a part of a function.

Usually, you're free to call a function whenever you wish to. The only restriction is that every executable program needs to have one, and only one, main function, which is where the program begins executing.

We will discuss functions in more detail in a later chapter, C Programming/Procedures and functions.

The standard library

[edit | edit source]

In 1983, when C was in the process of becoming standardized, the American National Standards Institute (ANSI) formed a committee to establish a standard specification of C known as "ANSI C". That standard specification created a basic set of functions common to each implementation of C, which is referred to as the Standard Library. The Standard Library provides functions for tasks such as input/output, string manipulation, mathematics, files, and memory allocation. The Standard Library does not provide functions that are dependent on specific hardware or operating systems, like graphics, sound, or networking. In the "Hello, World" program, a Standard Library function is used (puts) which outputs lines of text to the standard output stream.