Introduction to Programming Languages/Syntax Directed Type Checking

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Syntax Directed Type Checking[edit | edit source]

Compilers and interpreters can rely on grammars to implement many different forms of program verification. A well-known static verification that compilers of statically typed languages perform is type checking. Before generating machine code for a program written in a statically typed language, the compiler must ensure that the source program abides by the typing discipline imposed by that language. The type checker verifies, for instance, if the operands have the type expected by the operator. A important step of this kind of verification is to determine the type of arithmetic expressions. In our language of arithmetic expressions every program is an integer number. However, below we show a slightly modified version of that very language, in which numbers can either be integers or floating point.

expr --> mulexp, [+], expr.
expr --> mulexp.

mulexp --> rootexp, [*], mulexp.
mulexp --> rootexp.

rootexp --> ['('], expr, [')'].
rootexp --> dec_number.

dec_number --> number, [.], number.
dec_number --> number.

number --> digit.
number --> digit, number.

digit(N) --> [0] ; [1] ; [2] ; [3] ; [4] ; [5] ; [6] ; [7] ; [8] ; [9].

Many programming languages allow the intermixing of integer and floating point data types in arithmetic expressions. C is one of these languages. The type of a sum involving an integer and a floating point number is a floating point number. However, there are languages that do not allow this kind of mixture. In Standard ML, for instance, we can only sum up together two integers, or two real numbers. Lets adopt the C approach, for the sake of this example. Thus, the attribute grammar below not only implements an interpreter for our new language of arithmetic expressions, but also computes the type of an expression.

meet(integer, integer, integer).
meet(_, float, float).
meet(float, _, float).

expr(N, T) --> mulexp(N1, T1), [+], expr(N2, T2), {
  N is N1 + N2,
  meet(T1, T2, T)
}.
expr(N, T) --> mulexp(N, T).
mulexp(N, T) --> rootexp(N1, T1), [*], mulexp(N2, T2), {
  N is N1 * N2,
  meet(T1, T2, T)
}.
mulexp(N, T) --> rootexp(N, T).

rootexp(N, T) --> ['('], expr(N, T), [')'].
rootexp(N, T) --> dec_number(N, T).

dec_number(N, float) --> number(N1, _), [.], number(N2, C2), {
  CC is C2 * 10,
  N is N1 + N2 / CC} .
dec_number(N, integer) --> number(N, _).

number(N, 1) --> digit(N).
number(N, C) --> digit(ND), number(NN, C1), {
  C is C1 * 10,
  N is ND * C + NN
}.

digit(N) --> [0], {N is 0}
           ; [1], {N is 1}
           ; [2], {N is 2}
           ; [3], {N is 3}
           ; [4], {N is 4}
           ; [5], {N is 5}
           ; [6], {N is 6}
           ; [7], {N is 7}
           ; [8], {N is 8}
           ; [9], {N is 9}.

If we save the grammar above in a file called type_checker.pl, we can use it as in the execution section below. As we can see, expressions involving integers and floats have the floating point type. We have defined a meet operator that combines the different data types that we have. This name, meet, is a term commonly found in the jargon used in lattice theory.

?- consult(type_checker).
% type_checker compiled 0.00 sec, 5,544 bytes
true.

?- expr(N, T, [1, 2], []).
N = 12,
T = integer ;
false.

?- expr(N, T, [1, 2, +, 3, '.', 1, 4], []).
N = 15.14,
T = float ;
false.

Syntax Directed Translation · Compiled Programs