Write Yourself a Scheme in 48 Hours/Evaluation, Part 1
Beginning the Evaluator
Currently, we've just been printing out whether or not we recognize the given program fragment. We're about to take the first steps towards a working Scheme interpreter: assigning values to program fragments. We'll be starting with baby steps, but fairly soon you'll be progressing to doing working computations.
Let's start by telling Haskell how to print out a string representation of the various possible
showVal :: LispVal -> String showVal (String contents) = "\"" ++ contents ++ "\"" showVal (Atom name) = name showVal (Number contents) = show contents showVal (Bool True) = "#t" showVal (Bool False) = "#f"
This is our first real introduction to pattern matching. Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables. Any constructor can appear in a pattern; that pattern matches a value if the tag is the same as the value's tag and all subpatterns match their corresponding components. Patterns can be nested arbitrarily deep, with matching proceeding in an inside → outside, left → right order. The clauses of a function definition are tried in textual order, until one of the patterns matches. If this is confusing, you'll see some examples of deeply-nested patterns when we get further into the evaluator.
For now, you only need to know that each clause of the above definition matches one of the constructors of
LispVal, and the right-hand side tells what to do for a value of that constructor.
DottedList clauses work similarly, but we need to define a helper function
unwordsList to convert the contained list into a string:
showVal (List contents) = "(" ++ unwordsList contents ++ ")" showVal (DottedList head tail) = "(" ++ unwordsList head ++ " . " ++ showVal tail ++ ")"
unwordsList function works like the Haskell Prelude's
unwords function, which glues together a list of words with spaces. Since we're dealing with a list of
LispVals instead of words, we define a function that first converts the
LispVals into their string representations and then applies
unwords to it:
unwordsList :: [LispVal] -> String unwordsList = unwords . map showVal
Our definition of unwordsList doesn't include any arguments. This is an example of point-free style: writing definitions purely in terms of function composition and partial application, without regard to individual values or "points". Instead, we define it as the composition of a couple of built-in functions. First, we partially-apply
showVal, which creates a function that takes a list of
LispVals and returns a list of their string representations. Haskell functions are curried: this means that a function of two arguments, like
map, is really a function that returns a function of one argument. As a result, if you supply only a single argument, you get back a function of one argument that you can pass around, compose, and apply later. In this case, we compose it with unwords:
map showVal converts a list of
LispVals to a list of their string representations, and then
unwords joins the result together with spaces.
We used the function
show above. This standard Haskell function lets you convert any type that's an instance of the class
Show into a string. We'd like to be able to do the same with
LispVal, so we make it into a member of the class
Show, defining its
show method as
instance Show LispVal where show = showVal
Let's try things out by changing our readExpr function so it returns the string representation of the value actually parsed, instead of just "Found value":
readExpr input = case parse parseExpr "lisp" input of Left err -> "No match: " ++ show err Right val -> "Found " ++ show val
And compile and run…
$ ghc -package parsec -o parser listing4.1.hs $ ./parser "(1 2 2)" Found (1 2 2) $ ./parser "'(1 3 (\"this\" \"one\"))" Found (quote (1 3 ("this" "one")))
Beginnings of an evaluator: Primitives
Now, we start with the beginnings of an evaluator. The purpose of an evaluator is to map some "code" data type into some "data" data type, the result of the evaluation. In Lisp, the data types for both code and data are the same, so our evaluator will return a
LispVal. Other languages often have more complicated code structures, with a variety of syntactic forms.
Evaluating numbers, strings, booleans, and quoted lists is fairly simple: return the datum itself.
eval :: LispVal -> LispVal eval val@(String _) = val eval val@(Number _) = val eval val@(Bool _) = val eval (List [Atom "quote", val]) = val
This introduces a new type of pattern. The notation
val@(String _) matches against any
LispVal that's a string and then binds
val to the whole
LispVal, and not just the contents of the
String constructor. The result has type
LispVal instead of type
String. The underbar is the "don't care" variable, matching any value yet not binding it to a variable. It can be used in any pattern, but is especially useful with @-patterns (where you bind the variable to the whole pattern) and with simple constructor-tests where you're just interested in the tag of the constructor.
The last clause is our first introduction to nested patterns. The type of data contained by
[LispVal], a list of
LispVals. We match that against the specific two-element list
[Atom "quote", val], a list where the first element is the symbol "quote" and the second element can be anything. Then we return that second element.
eval into our existing code. Start by changing
readExpr back so it returns the expression instead of a string representation of the expression:
readExpr :: String -> LispVal readExpr input = case parse parseExpr "lisp" input of Left err -> String $ "No match: " ++ show err Right val -> val
And then change our main function to read an expression, evaluate it, convert it to a string, and print it out. Now that we know about the
>>= monad sequencing operator and the function composition operator, let's use them to make this a bit more concise:
main :: IO () main = getArgs >>= print . eval . readExpr . head
Here, we take the result of the
getArgs action (a list of strings) and pass it into the composition of:
- take the first value (
- parse it (
- evaluate it (
- convert it to a string and print it (
Compile and run the code the normal way:
$ ghc -package parsec -o eval listing4.2.hs $ ./eval "'atom" atom $ ./eval 2 2 $ ./eval "\"a string\"" "a string" $ ./eval "(+ 2 2)" Fail: listing6.hs:83: Non-exhaustive patterns in function eval
We still can't do all that much useful with the program (witness the failed
(+ 2 2) call), but the basic skeleton is in place. Soon, we'll be extending it with some functions to make it useful.
Adding basic primitives
Next, we'll improve our Scheme so we can use it as a simple calculator. It's still not yet a "programming language", but it's getting close.
Begin by adding a clause to eval to handle function application. Remember that all clauses of a function definition must be placed together and are evaluated in textual order, so this should go after the other eval clauses:
eval (List (Atom func : args)) = apply func $ map eval args
This is another nested pattern, but this time we match against the cons operator ":" instead of a literal list. Lists in Haskell are really syntactic sugar for a chain of cons applications and the empty list:
[1, 2, 3, 4] = 1:(2:(3:(4:))). By pattern-matching against cons itself instead of a literal list, we're saying "give me the rest of the list" instead of "give me the second element of the list". For example, if we passed
(+ 2 2) to eval,
func would be bound to "+" and
args would be bound to
[Number 2, Number 2].
The rest of the clause consists of a couple of functions we've seen before and one we haven't defined yet. We have to recursively evaluate each argument, so we map
eval over the args. This is what lets us write compound expressions like
(+ 2 (- 3 1) (* 5 4)). Then we take the resulting list of evaluated arguments, and pass it and the original function to apply:
apply :: String -> [LispVal] -> LispVal apply func args = maybe (Bool False) ($ args) $ lookup func primitives
The built-in function
lookup looks up a key (its first argument) in a list of pairs. However, lookup will fail if no pair in the list contains the matching key. To express this, it returns an instance of the built-in type
Maybe. We use the function
maybe to specify what to do in case of either success or failure. If the function isn't found, we return a
Bool False value, equivalent to
#f (we'll add more robust error-checking later). If it is found, we apply it to the arguments using
($ args), an operator section of the function application operator.
Next, we define the list of primitives that we support:
primitives :: [(String, [LispVal] -> LispVal)] primitives = [("+", numericBinop (+)), ("-", numericBinop (-)), ("*", numericBinop (*)), ("/", numericBinop div), ("mod", numericBinop mod), ("quotient", numericBinop quot), ("remainder", numericBinop rem)]
Look at the type of
primitives. It is a list of pairs, just like
lookup expects, but the values of the pairs are functions from
LispVal. In Haskell, you can easily store functions in other data structures, though the functions must all have the same type.
Also, the functions that we store are themselves the result of a function,
numericBinop, which we haven't defined yet. This takes a primitive Haskell function (often an operator section) and wraps it with code to unpack an argument list, apply the function to it, and wrap the result up in our
numericBinop :: (Integer -> Integer -> Integer) -> [LispVal] -> LispVal numericBinop op params = Number $ foldl1 op $ map unpackNum params unpackNum :: LispVal -> Integer unpackNum (Number n) = n unpackNum (String n) = let parsed = reads n :: [(Integer, String)] in if null parsed then 0 else fst $ parsed !! 0 unpackNum (List [n]) = unpackNum n unpackNum _ = 0
As with R5RS Scheme, we don't limit ourselves to only two arguments. Our numeric operations can work on a list of any length, so
(+ 2 3 4) = 2 + 3 + 4, and
(- 15 5 3 2) = 15 - 5 - 3 - 2. We use the built-in function
foldl1 to do this. It essentially changes every cons operator in the list to the binary function we supply,
Unlike R5RS Scheme, we're implementing a form of weak typing. That means that if a value can be interpreted as a number (like the string "2"), we'll use it as one, even if it's tagged as a string. We do this by adding a couple extra clauses to
unpackNum. If we're unpacking a string, attempt to parse it with Haskell's built-in
reads function, which returns a list of pairs of (parsed value, remaining string).
For lists, we pattern-match against the one-element list and try to unpack that. Anything else falls through to the next case.
If we can't parse the number, for any reason, we'll return 0 for now. We'll fix this shortly so that it signals an error.
Compile and run this the normal way. Note how we get nested expressions "for free" because we call eval on each of the arguments of a function:
$ ghc -package parsec -o eval listing7.hs $ ./eval "(+ 2 2)" 4 $ ./eval "(+ 2 (-4 1))" 2 $ ./eval "(+ 2 (- 4 1))" 5 $ ./eval "(- (+ 4 6 3) 3 5 2)" 3