C Programming/Particularities of C

From Wikibooks, open books for an open world
Jump to navigation Jump to search

C is an efficient, minimalist language that has some peculiarities that a programmer must be aware of. To address these, sometimes a good solution is to combine another language with C for added flexibility and power, like the combination of Emacs-LISP and C used for Emacs. Sometimes they can be addressed at the cost of slower speed and increased complexity by using special constructs that will guarantee function and security. Mostly however, through practice, C programmers have no trouble with the things mentioned here, and prefer using a language that closely models the general purpose, Von Neumann hardware architecture.

Below are several of these particularities of ANSI C (that sometimes are also its strengths), some minor and some major:

Lack of differentiation between arrays and pointers
The very first C (around 1973) did not have arrays at all; modern implementations are contiguous areas in memory accessed with pointer arithmetic (note: a declared array cannot be assigned to like a pointer), which circumvents the need to declare arrays with a fixed size. This ability, however, can cause buffer overflow errors with careless use.
Arrays do not store their length
A consequence of the above feature. This means that the program might need to explicitly perform a bounds check before accessing an array. Unless a function is passed an array of a fixed size, there is no way for it to discover the length of the array it was given: So the function must be given the length, perhaps passed to the function as a separate variable or in a structure. Because of this, most implementations do not provide automatic array bounds checking, and manual bounds checking is error prone.
If a C (or C++) program attempts to access an array element outside of the actual allocated memory, then a buffer overflow occurs, typically crashing the program. Buffer overflow bugs are a common security vulnerability too. Many other computer languages provide automatic bounds checking, and so they are nearly immune to such bugs. [1][2][3][4][5]
Variable Length Arrays
A VLA ‒ variable length array ‒ can only be used for function parameters and auto variables. VLAs cannot be used inside a structure (except as the last item in the structure). It's not possible to define a structure that corresponds to the standard Forth dictionary definition (which has 2 variable-length parts), except as an undifferentiated array of char.
Arbitrary-size built-in 2D or 3D arrays are not widely supported
This feature has been added starting with the C99 specification for variable-length arrays, although many C compilers still do not support it. Without VLAs, there is no way for a function to accept 2D or 3D arrays of arbitrary size. In particular, it's impossible to define a function that accepts int a[5][4][3]; on one call, and later accepts int b[10][10][10]; in a later call. Instead of using the built-in 2D or 3D array data type, C programmers use some other data type to hold (mathematical) 2D or 3D arrays of arbitrary size (multi-dimensional arrays) -- see C Programming/Common practices#Dynamic multidimensional arrays for details.
No formal String data type
Strings are character arrays (lacking any abstraction) and inherit all their constraints (structs can provide an abstraction, to an extent).
Weak type safety
C is not very type-safe. The memory management functions operate on untyped pointers, there is no built-in run-time type enforcement, and the type system can be circumvented with pointers and casts. Additionally, typedef does not create a new type but only an alias, thus it serves solely for code legibility. However, it is possible to use single member structs to enforce type safety.
No garbage collection
As a low-level language designed for minimum overhead, C features only manual memory management, which can allow simple memory leaks to go on unchecked.
Local variables are uninitialized upon declaration
Local (but not global) variables must be initialized manually; before this, they contain whatever was already in memory at the time. This is not unusual, but the C standard does not forbid access to uninitialized variables (which is).
Unwieldy function pointer syntax
Function pointers take the form of [return type] [name]([arg1 type])([arg2 type]), making them somewhat difficult to use. Typedefs can alleviate this burdensome syntax. For example, typedef int fn(int i);. See C Programming/Pointers and arrays#Pointers to Functions for more details.
No reflection
It is not possible for a C program -- at runtime -- to evaluate a string as if it were a source C code statement.
Nested functions are not standard
However, many C compilers do support nested functions, including GNU C.[6]
No formal exception handling
Some standard functions return special values that must be handled manually. For example, malloc() returns null upon failure. For example, one must store the return value of getchar() in an int (not, as one might expect, in a char) in order to reliably detect the end-of-file -- see EOF pitfall. Programs that do not include appropriate error handling might work fine most of the time, but can crash or otherwise malfunction when exceptional cases occur. POSIX systems often use signal() to handle some kinds of exceptions. (See C Programming/Error handling#Signals for details). Some programs use setjmp(), longjmp() or goto to manually handle some kinds of exceptions. (See C Programming/Control#One last thing: goto and C Programming/Coroutines for details).
No anonymous function definitions

References[edit | edit source]