C Programming/C trigraph

From Wikibooks, open books for an open world
Jump to: navigation, search

Trigraphs[edit]

C was designed in English and assumes the common English character set, which includes such characters as {, }, [, ], and so on. Some other languages, however, do not have these or other characters which are required by C. To solve this problem, the 1989 C standard in section 5.2.1.1 defined a set of trigraph sequences which can be substitutes for the symbols and which will work in any situation. In fact, the first translation phase of compilation specified in the 1989 C standard (section 5.1.1.2) is to replace the trigraph sequences with their corresponding single-character equivalents.

The following trigraph sequences exist, and no other. Each question mark ? that does not begin one of the trigraph sequences listed is not changed.

Sequence Replacement
======== ===========
  ??=         #
  ??(         [
  ??/         \
  ??)         ]
  ??'         ^
  ??<         {
  ??!         |
  ??>         }
  ??-         ~

The effect of this is that statements such as

printf ("Eh???/n");

will, after the trigraph is replaced, be the equivalent of

printf ("Eh?\n");

The 1999 C standard added these punctuators, sometimes called digraphs, in section 6.4.6. They are equivalent to the following tokens except for their spelling:

Digraph Equivalent
======= ==========
   <:       [
   :>       ]
   <%       {
   %>       }
   %:       #
  %:%:      ##

In other words, they behave differently when stringized as part of a macro replacement, but are otherwise equivalent.