C Programming/Particularities of C

From Wikibooks, open books for an open world
Jump to: navigation, search


C is an extremely efficient minimalist language that unfortunately some particularities can become weaknesses to the less attentive programmer. Some of these can be offset by the use of another language in parallel for added flexibility and power, like the combination of Emacs-LISP and C used for Emacs or come at the cost of speed and increased complexity, by using special constructs that will guarantee function, security and performance.

Below are several of these particularities of ANSI C (that sometimes also form the foundations of its strengths), some minor and some major:

Lack of differentiation between arrays and pointers 
The very first C (around 1973) did not have arrays at all; modern implementations are contiguous areas in memory accessed with pointer arithmetic (note: a declared array cannot be assigned to like a pointer), which served as an escape from the previous need to declare arrays with a constant size. This ability, however, can cause problems with careless use.
Arrays do not track their length 
A consequence of the above. This means that the length must be tracked manually. Unless a function is passed an array of a fixed size, there is no way for it to discover the length of the array it was given: So the function must be given the length, perhaps in a separate variable or struct, or else hope that whoever allocated the array made it the right length. Because of this, most implementations do not provide automatic bounds checking, and manual bounds checking is error-prone.

It is extremely easy for a C (or C++) programmer to write code that does not always correctly check the array bounds of every array in the program every time something is added to any array, leading to a buffer overflow vulnerability bug. Buffer overflow bugs are the most common security vulnerability in software. Almost every other language provides automatic bounds checking, and so they are nearly immune to such bugs. [2] [3] [4] [5] [6]

Variable Length Arrays 
A VLA ‒ variable length array ‒ can only be used for function parameters and auto variables. VLAs cannot be allocated on the heap[citation needed] or used inside a structure (except as the last item in the structure). It's not possible to define a structure that corresponds to the standard Forth dictionary definition (which has 2 variable-length parts), except as an undifferentiated array of char.
Arbitrary-size built-in 2D or 3D arrays not widely supported 
This has been solved with C99 variable-length arrays, although many C compilers still do not support VLAs. Without VLAs, there is no way for a function to accept the built-in 2D or 3D arrays of arbitrary size. In particular, it's impossible to define a function that accepts int a[5][4][3]; on one call, and later accepts int b[10][10][10]; in a later call. Instead of using the built-in 2D or 3D array data type, C programmers use some other data type to hold (mathematical) 2D or 3D arrays of arbitrary size (multi-dimensional arrays) -- see C Programming/Common practices#Dynamic multidimensional arrays for details.
No formal String data type 
Strings are character arrays (lacking any abstraction) and inherit all their constraints (structs can provide an abstraction, to an extent).
Weak type safety 
C is not very type-safe. The memory management functions operate on untyped pointers, there is no built-in run-time type enforcement, and the type system can be circumvented with pointers and casts. Additionally, typedef does not create a new type but only an alias, thus it serves solely for code legibility. However, it's possible to use single-member structs to enforce type safety.
No garbage collection 
As a low-level language designed for minimum overhead, C features only manual memory management, which can allow simple memory leaks to go on unchecked.
Local variables are uninitialized upon declaration 
Local (but not global) variables must be initialized manually; before this, they contain whatever was already in memory at the time. This is not terribly unusual, but the C standard does not forbid access to uninitialized variables(which is).
Unwieldy function pointer syntax 
Function pointers take the form of [return type] [name]([arg1 type])([arg2 type]), making them somewhat difficult to use. Typedefs can alleviate this burdensome syntax. For example, typedef int fn(int i);. See C Programming/Pointers and arrays#Pointers to Functions for more details.
No reflection 
It is not possible for a C program -- at runtime -- to evaluate a string as if it were a source C code statement.
Nested functions not standard
However, many C compilers do support nested functions, including GNU C.[1]
No formal exception handling 
Some standard functions return special values that must be manually handled. For example, malloc() returns null upon failure. For example, one must store the return value of getchar() in an int (not, as one might expect, in a char) in order to reliably detect the end-of-file -- see EOF pitfall. Too many programs don't include error handling. Such programs seem to work fine most of the time, but and crash or otherwise malfunction when exceptional cases occur. POSIX systems often use signal() to handle some kinds of exceptions. (See {where can I read more about signal()?} for details). Some programs use setjmp(), longjmp() or goto to manually handle some kinds of exceptions. (See C Programming/Control#One last thing: goto and C Programming/Coroutines for details).
No anonymous function definitions


  1. "A GNU Manual": "Extensions to the C Language: Nested Functions" [1]