User:Jimregan/C Primer chapter 2

From Wikibooks, open books for an open world
Jump to navigation Jump to search

C Variables, Operators, & Preprocessor Directives

C supports a flexible set of variable types and structures, as well as common arithmetic and math functions along with a few interesting operators that are unique to C. This chapter explains them in detail, and ends with a short discussion of preprocessor commands.

C Variables, Declarations, and Constants

C includes the following fundamental data types:

  ____________________________________________________________________
  type            use          size         range
  ____________________________________________________________________
  char            character     8 bits   -128 to 127
  unsigned char   character     8 bits   0 to 255
  short           integer      16 bits   -32,768 to 32,767
  unsigned short  integer      16 bits   0 to 65,535
  int             integer      32 bits   -32,768 to 32,767
  unsigned int    integer      32 bits   0 to 65,535
  long            integer      32 bits   -2,147,483,648 to 2,147,483,647
  unsigned long   integer      32 bits   0 to 4,294,967,295
  float           real         32 bits   1.2E-38 to 3.4E+38
  double          real         64 bits   2.2E-308 to 1.8E+308
  long double     real        128 bits   3.4E-4932 to 1.2E+4932
  ____________________________________________________________________

These are representative values. The definitions tend to vary from system to system. For example, in some systems an "int" is 16 bits, and a "long double" could be 64 bits. The only thing that is guaranteed is the precedence:

  short <= int <= long
  float <= double <= long double

One peculiarity of C that can lead to maddening problems is that while there is an "unsigned char" data type, for some reason many functions that deal with individual characters require variables to be declared "int" or "unsigned int".

Declarations are of the form:

  int myval, tmp1, tmp2;
  unsigned int marker1 = 1, marker2 = 10;
  float magnitude, phase;

Variable names can be at least 31 characters long, though modern compilers will invariably support longer. Variables names can be made up of letters, digits, and the "_" (underscore) character; the first character must be a letter. While you can use uppercase letters in variable names, conventional C usage reserves uppercase for constant names. A leading "_" is also legal, but is generally reserved for marking internal library names.

C allows several variables to be declared in the same statement, with commas separating the declarations. The variables can be initialized when declared. Constant values for declarations can be declared in various formats:

  128:       decimal int
  256u:      decimal unsigned int
  512l:      decimal long int
  0xAF:      hex int
  0173:      octal int
  0.243:     float
  0.1732f:   float
  15.75E2:   float
  'a':       character
  "giday":   string

There are a number of special characters defined in C:

  '\a':    alarm (beep) character
  '\\p':    backspace
  '\f':    formfeed
  '\n':    newline
  '\r':    carriage return
  '

'\v': vertical tab '\': backslash '\?': question mark '\: single quote '"\': double quote '\0NN': character code in octal '\xNN': character code in hex '\0': NULL character You can specify "symbolic constants" using the "define" C preprocessor declaration:

  #define PI 3.141592654

There is also a "const" declaration that defines a read-only variable, such as a memory location in ROM:

  const int a;

Arrays can be declared and initialized:

  int myarray[10];
  unsigned int list[5] = { 10, 15, 12, 19, 23 };
  float rdata[128], grid[5][5];

All C arrays have a starting index of 0, so "list" has the indexes 0 through 4. Elements in "rdata" would be accessed as follows:

  for( i = 1; i <= 127; i = i + 1 )
  {
     printf ( "\f\n", rdata[i] );
  }

C does not perform rigorous bounds checking on array access. You can easily overrun the bounds of the array if you're not careful, and never realize it except for the fact that you are getting some very strange data.

  • Of particular importance are arrays of characters, which are used to store strings:
  char s[128];
  strcpy( s, "This is a test!");

The string "This is a test!" is used to initialize "s" through the "strcpy()" function, discussed in a later chapter. The stored string will contain a terminating "null" character (the character with ASCII code 0, represented by '\0'). The null is used by C functions that manipulate strings to determine where the end of the string is, and it is important to remember the null is there.

The curious reader may wonder why the "strcpy()" function is needed to initialize the string. It might seem to be easier to do:

  char s[128] = "This is a test!";

In fact, this is an absurd operation, but to explain why, the concept of "pointers" must be introduced.

C allows you to define pointers that contain the address of a variable or an array. You could, for example, define a pointer named:

  int *ptr;

-- that is the address of a variable, rather than the variable itself. You could, in a convoluted fashion, then put a value into that location with the statement:

  *ptr = 345;

In an inverse fashion, you can use "&" to get the address of a variable:

  int tmp;
  somefunc( &tmp );

This is confusing, so to sum up:

   * A pointer is declared in the form: "*myptr".
   * If "myvar" is a variable, then "&myvar" is a pointer to that variable.
   * If "myptr" is a pointer, then "*myptr" gives the variable data for that pointer. 

Pointers are useful because they allow a a function to return a value through a parameter variable. Otherwise, the function will simply get the data the variable contains and have no access to the variable itself.

One peculiar aspect of C is that the name of an array actually specifies a pointer to the first element in the array. For example, if you declare:

  char s[256];

-- then if you perform:

  somefunc( s )

-- you have actually passed the address of the character array to the function, and the function will be able to modify it. However:

  s[12]

-- gives the value in the array value with index 12. Remember that this is the 13th element, since indexes always start at 0.

There are more peculiarities to strings in C. Another interesting point is that a string literal actually evaluates to a pointer to the string it defines. This means that if you perform the following operation:

  char *p;
  p = "Life, the Universe, & Everything!";

-- then "p" would be a pointer to the memory in which the C compiler stored the string literal, and "p[0]" would evaluate to "L". In a similar sense, you could also perform the following operation and get:

  char ch;
  ch = "Life, the Universe, & Everything!"[0];

-- and get the character "L" into the variable "ch".

This is very well and good, but why care? The reason to care is because this explains why the operation:

  char s[128] = "This is a test!";

-- is absurd. This statement tells the C compiler to reserve 128 bytes of memory and set a pointer named "s" to point to them. Then it reserves another block of memory to store "This is a test!" and points "s" to that. This means the block of 128 bytes of memory that were originally allocated are now sitting empty and unusable, and the program is actually accessing the memory that stores "This is a test!".

This will seem to work for a while, until the program tries to store more bytes into that block than can fit into the 16 bytes reserved for "This is a test!". Since C is poor about bounds checking, this may cause all kinds of trouble.

This is why "strcpy()" is necessary, unless you simply want to define a string that will not be modified or will not be used to store more data than it is initialized to. In that case, you can perform:

  char *p;
  p = "Life, the Universe, & Everything!                   ";

This is particularly tricky when passing strings as parameters to functions. The following example shows how to get around the pitfalls:

  /* strparm.c */
  #include <stdio.h>
  #include <string.h>
  char *strtest( char *a, char *b );
  
  main ()
  {
    char a[256], 
         b[256], 
         c[256]; 
    strcpy( a, "STRING A: ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" );
    strcpy( b, "STRING B: ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" );
    strcpy( c, "STRING C: ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" );
    printf( "Initial values of strings:\n" );
    printf( "\n" );
    printf( "   a = %s\n", a );
    printf( "   b = %s\n", b );
    printf( "   c = %s\n", c );
    printf( "\n" );
    strcpy( c, strtest( a, b ));
  
    printf( "Final values of strings:\n" );
    printf( "\n" );
    printf( "   a = %s\n", a );
    printf( "   b = %s\n", b );
    printf( "   c = %s\n", c );
    printf( "\n" );
    
  }
  
  char *strtest( char *x, char *y )
  {
    printf( "Values passed to function:\n" );
    printf( "\n" );
    printf( "   x = %s\n", x );
    printf( "   y = %s\n", y );
    printf( "\n" );
  
    strcpy( y, "NEWSTRING B: abcdefghijklmnopqrstuvwxyz0123456789" );
    return( "NEWSTRING C: abcdefghijklmnopqrstuvwxyz0123456789" );
  }

You can define "structures" in C, which are collections of different data elements:

  /* struct.c */
  #include <stdio.h>
  #include <string.h>
  
  struct person                              /* Define structure type. */
  {
     char name[50];
     int age;
     float wage;
  };
  
  void display( struct person );
  
  void main()
  {
    struct person m;                         /* Declare an instance of it. */
    strcpy( m.name, "Coyote, Wile E." );     /* Initialize it. */
    m.age = 41;
    m.wage = 25.50f;
    display( m );
  }
  
  void display( struct person p )
  {
    printf( "Name: %s\n", p.name );
    printf( "Age:  %d\n", p.age );
    printf( "Wage: %4.2f\n", p.wage );
  }

This program has a few interesting features:

   * The structure has to be defined by a "struct" declaration before you can declare any structures themselves. In this case we define a struct of type "person".
   * Instances of the struct ("m") are then declared as by defining the structure type ("struct person").
   * Elements of the structure are accessed with a "dot" notation ("m.name", "m.age", and "m.wage"). 

You can copy a structure to another structure with a single assignment statement, as long as the structures are of the same type:

  struct person m, n;
  ...
  m = n;

You can also declare arrays of structures:

  struct person group[10];
  ...
  strcpy( group[5].name, "McQuack, Launchpad" );

-- or even embed structures inside structure declarations:

  struct trip_rec
  {
     struct person traveler;
     char dest[50];
     int date[3];
     int duration;
     float cost;
  }

-- in which case the nested structure would be accessed as follows:

  struct trip_rec t1;
  ...
  strcpy( t1.traveler.name, "Martian, Marvin" );

The name of a structure defines a variable, not an address. If you pass the name of a structure to a function, the function works only on its local copy of the structure. If you want to return values, you have to provide an address:

  setstruct( &mystruct );

There is a shorthand way to get at the elements of a structure if you have the pointer to the structure instead of the structure itself. If "sptr" is a pointer to a structure of type "person", you could access its fields as follows:

  strcpy( sptr->name, "Leghorn, Foghorn" );
  sptr->age = 50;
  sptr->wage = 12.98;

C contains a concept similar to a structure known as a "union". A union is declared in much the same way as a structure. For example:

  union usample 
  {
    char ch;
    int x;
  }

The difference is that the union can store either of these values, but not both at the same time. You can store a "char" value or an "int" value in an instance of the union defined above, but you can't store both at the same time. Only enough memory is allocated for the union to store the value of the biggest declared item in it, and that same memory is used to store data for all the declared items. Unions are not often used and will not be mentioned further.

The following example program shows a practical use of structures. It tests a set of functions that perform operations on three-dimensional vectors:

  vadd():     Add two vectors.
  vsub():     Subtract two vectors.
  vdot():     Vector dot product.
  vcross():   Vector cross product.
  vnorm():    Norm (magnitude) of vector.
  vangle():   Angle between two vectors.
  vprint():   Print out vector.

The program follows:

  /* vector.c */
  #include <stdio.h>
  #include <math.h>
  
  #define PI 3.141592654
  
  struct v
  {
     double i, j, k;
  };
  
  void vadd( struct v, struct v, struct v* );
  void vprint( struct v );
  void vsub( struct v, struct v, struct v* );
  double vnorm( struct v );
  double vdot( struct v, struct v );
  double vangle( struct v, struct v );
  void vcross( struct v, struct v, struct v* );
  
  void main()
  {
    struct v v1 = { 1, 2, 3 }, v2 = { 30, 50, 100 }, v3;
    double a;
  
    printf( "Sample Vector 1: " );
    vprint( v1 );
    printf( "Sample Vector 2: " );
    vprint( v2 );
  
    vadd( v1, v2, &v3 );
    printf( "Vector Add:      " );
    vprint( v3 );
  
    vsub( v1, v2, &v3 );
    printf( "Vector Subtract: " );
    vprint( v3 );
  
    vcross( v1, v2, &v3 );
    printf( "Cross Product:   " );
    vprint( v3 );
  
    printf( "\n" );
    printf( "Vector 1 Norm:  %f\n", vnorm( v1 ) );
    printf( "Vector 2 Norm:  %f\n", vnorm( v2 ) );
    printf( "Dot Product:    %f\n", vdot( v1, v2 ) );
    a = 180 * vangle( v1, v2) / PI ;
    printf( "Angle:          %3f degrees.\n", a );
  
  } 
  
  void vadd( struct v a, struct v b, struct v *c )  /* Add vectors. */
  {
    c->i = a.i + b.i;
    c->j = a.j + b.j;
    c->k = a.k + b.k;
  }
  
  double vangle( struct v a, struct v b )  /* Get angle between vectors. */
  {
    double c;
    c = vdot( a, b ) / ( vnorm( a ) * vnorm( b ) );
    return( atan( sqrt( ( 1 - ( c * c ) ) / ( c * c ) ) ) );
  }
  
  void vcross( struct v a, struct v b, struct v *c )  /* Cross product. */
  {
    c->i = a.j * b.k - a.k * b.j;
    c->j = a.k * b.i - a.i * b.k;
    c->k = a.i * b.j - a.j * b.i;
  }
  
  double vdot( struct v a, struct v b ) /* Dot product of vectors. */
  {
    return( a.i * b.i + a.j * b.j + a.k * b.k );
  }
  
  double vnorm ( struct v a )  /* Norm of vectors. */
  {
    return( sqrt( ( a.i * a.i ) + ( a.j * a.j ) + ( a.k * a.k ) ) );
  }
  
  void vprint ( struct v a )  /* Print vector. */
  {
    printf( " I = %6.2f   J = %6.2f   K = %6.2f\n", a.i, a.j, a.k );
  }
  
  void vsub ( struct v a, struct v b, struct v *c )  /* Subtract vectors. */
  {
    c->i = a.i - b.i;
    c->j = a.j - b.j;
    c->k = a.k - b.k;
  }

You should be familiar with the concept of local and global variables by now. You can also declare a local variable as "static", meaning it retains its value from one invocation of the function to the next. For example:

  #include <stdio.h>
  void testfunc( void );
  void main()
  {
    int ctr;
    for( ctr = 1; ctr < 8; ++ctr )
    {
      testfunc();
    }
  }
  
  void testfunc( void )
  {
    static int v;
    printf( "%d\n", 2*v );
    ++v;
  }

This prints:

  0
  2
  4
  6
  8 
  10
  12
  14

-- since the initial value of a integer is 0 by default. It is not a good idea to rely on a default value!

There are two other variable declarations that you should recognize though you should have little reason to use them: "register", which declares that a variable should be assigned to a CPU register, and "volatile", which tells the compiler that the contents of the variable may change spontaneously.

There is more and less than meets the eye to these declarations. The "register" declaration is discretionary: the variable will be loaded into a CPU register if it can, and if not it will be loaded into memory as normal. Since a good optimizing compiler will try to make the best use of CPU registers anyway, this is not in general all that useful a thing to do.

The "volatile" declaration appears ridiculous jat first sight, something like one of those "joke" computer commands like "halt and catch fire". Actually, it's used to describe a hardware register that can change independently of a program, such as the register for a realtime clock.

C is fairly flexible in conversions between data types. In many cases, the type conversion will happen transparently. If you convert from a "char" to a "short" data type, or from an "int" to a "long" data type, for example, the converted data type can easily accommodate any value in the original data type.

Converting from a bigger to a smaller data type can lead to odd errors. The same is true for conversions between signed and unsigned data types. For this reason, type conversions should be handled carefully, and it is usually preferable to do them explicitly, using a "cast" operation. For example:

  int a;
  float b;
  ...
  b = (float)a;

-- demonstrates a cast conversion from an "int" value to a "float" value.

You can define your own "enumerated" types in C. For example:

 enum day
 {
    saturday, sunday, monday, tuesday, wednesday, thursday, friday
 };

-- defines enumerated type "day" to consist of the values of the days of the week. In practice, the values are merely text constants associated to a set of consecutive integer values. By default, the set begins at 0 and counts up, so here "saturday" has the value 0, "sunday" has the value "1", and so on.

You can, however, specify your own set of values if you like:

  enum temps
  {
    zero = 0, freeze = 32, boil = 220
  };

Obviously you could do similar things through sets of "#define" directives, but this is a much cleaner solution. Once you define the type, for example, you can declare variables of that type as follows:

  enum day today = wednesday;

The variable "today" will act as an "int" variable and will allow the operations valid for "int" variables. Once more, remember that C doesn't do much in the way of bounds checking, and you should not rely on the C compiler to give you warnings if you are careless.

Finally, you can use the "typedef" declaration to define your own data types:

  typedef str ch[128];

Then you could declare variables of this type as follows:

  str name;

C Operators

C supports the following arithmetic operators:

  c = a * b;   /* multiplication */
  c = a / b;   /* division */
  c = a % b;   /* mod (remainder division) */
  c = a + b;   /* addition */
  c = a - b;   /* subtraction */

It also supports the following useful (but cryptic) increment and decrement operators:

  ++a;   /* increment */
  --a;   /* decrement */

These operators can also be expressed as "a++" and "a--". If all you want to do is increment or decrement, the distinction between the two forms is irrelevant. However, if you are incrementing or decrementing a variable as a component of some expression, then "++a" means "increment the variable first, then get its value", while "a++" means "get the value of the variable first, then increment it". Confusing these things can lead to subtle programming errors.

C supports a set of bitwise operations:

  a = ~a;       /* bit complement */
  a = b << c;   /* shift b left by number of bits stored in c */
  a = b >> c;   /* shift b right by number of bits stored in c */
  a = b & c;    /* b AND c */
  a = b ^ c;    /* b XOR c */
  a = b | c;    /* b OR c */

C allows you to perform all these operations in a shortcut fashion:

  a = a * b;   a *= b;
  a = a / b;   a /= b;
  a = a % b;   a %= b;
  a = a + b;   a += b;
  a = a - b;   a -= b;
  a = a << b;  a <<= b;
  a = a >> b;  a >>= b;
  a = a & b;   a &= b;
  a = a ^ b;   a ^= b;
  a = a | b;   a |= b;

The C relational operations were discussed in the previous chapter and are repeated here for completeness:

  a == b:   equals
  a != b:   not equals
  a < b:    less than
  a > b:    greater than
  a <= b:   less than or equals
  a >= b:   greater than or equals

These are actually math operations that yield "1" if true and "0" if false. You could, for example, have an operation as follows:

  a = b * ( b < 2 ) + 10 * ( b >= 2 );

This would give "a" the value "b" if "b" is less than 2, and the value "10" otherwise. This is cute, but not recommended. It's cryptic; may impair portability to other languages; and in this case at least can be done much more effectively with the conditional operator discussed in the previous chapter:

  a = ( b < 2 ) ? b : 10; 

This conditional operator is also known as the "triadic" operator.

There are similar logical operators:

  !:    logical NOT
  &&:   logical AND
  ||:   logical OR

Remember that these are logical operations, not bitwise operations -- don't confuse "&&" and "||" with "&" and "|". The distinction is that while the bitwise operators perform the operations on a bit-by-bit basis, the logical operations simply assess the values of their operands to be either "0" or "1" (any nonzero operand value evaluates to "1" in such comparisons) and return either a "0" or a "1":

  if(( A == 5 ) && ( B == 10 ))
  {
     ...
  }

Finally, there is a "sizeof" operand that returns the size of a particular operand in bytes:

  int tvar;
  ...
  printf ( "Size = %d\n", sizeof( int ) );

This comes in handy for some mass storage operations. You can provide "sizeof()" with a data type name or the name of a variable, and the variable can be an array, in which case "sizeof" gives the size of the entire array.

The precedence of these operators in math functions -- that is, which ones are evaluated before others -- are defined as follows, reading from the highest precedence to the lowest:

  ()     []     ->     .
  !      ~      ++     --     (cast)*       &      sizeof   - (minus prefix)
  *      /      %
  +      -
  <<     >>
  <      <=     >      >=
  ==     !=
  &
  ^
  |
  &&
  ||
  ?:
  =      +=     -=     *=     /=     %=     >>=     <<=     &=
  ^=     |=
  , 

Of course, parentheses can be used to control precedence. If you have any doubts about the order of evaluation of an expression, add more parentheses. They won't cause you any trouble, and might save you some.

Advanced math operations are available as library functions. These will be discussed in a later chapter.

C Preprocessor Directives

We've already seen the "#include" and "#define" preprocessor directives. The C preprocessor supports several other directives as well. All such directives start with a "#" to allow them to be distinguished from C language commands.

As explained in the first chapter, the "#include" directive allows you to insert the contents of other files in your C source code:

  #include <stdio.h>

Observe that the standard header file "stdio.h" is specified in angle brackets. This tells the C preprocessor that the file can be found in the standard directories designated by the C compiler for header files. If you want to include a file from a nonstandard directory, you use double quotes:

  #include "\home\mydefs.h"

Include files can be nested. They can call other include files.

Also as explained in the first chapter, the "#define" directive can be used to specify symbols to be substituted for specific strings of text:

  #define PI 3.141592654
  ...
  a = PI * b;

In this case, the preprocessor does a simple text substitution on PI throughout the source listing. The C compiler proper not only does not know what PI is, it never even sees it.

The "#define" directive can be used to create function-like macros that allow parameter substitution. For example:

  #define ABS(value)  ( (value) >=0 ? (value) : -(value) )

This macro could then be used in an expression as follows:

  printf( "Absolute value of x = %d\n", ABS(x) );

Beware that such function-like macros don't behave exactly like true functions. For example, suppose you used "x++" as an argument for the macro above:

  val = ABS(x++);

This would result in "x" being incremented three times because "x++" is substituted in the expression three times:

  val = ( (x++) >=0 ? (x++) : -(x++) )

Along with the "#define" directive, there is also an "#undef" directive that allows you to undefine a constant that has been previously defined:

  #undef PI

Another feature supported by the C preprocessor is conditional compilation, using the following directives:

  #if
  #else
  #elif
  #endif

These directives can test the values of defined constants to define which blocks of code are passed on to the C compiler proper:

  #if WIN == 1
    #include "WIN.H"
  #elif MAC == 1
    #include "MAC.H"
  #else
    #include "LINUX.H"
  #endif

You can nest these directives if needed. The "#if" and "#elif" can also test to see if a constant has been defined at all, using the "defined" operator:

  #if defined( DEBUG )
     printf( "Debug mode!\n);
  #endif

-- or test to see if a constant has not been defined:

  #if !defined( DEBUG )
     printf( "Not debug mode!\n);
  #endif

Finally, there is a "#pragma" directive, which by definition is a catch-all used to implement machine-unique commands that are not part of the C language. Such "pragmas" vary from compiler to compiler, as they are by definition nonstandard.

v2.0.7 / 2 of 7 / 01 feb 02 / greg goebel / public domain