Write Yourself a Scheme in 48 Hours/Adding Variables and Assignment
Finally, we get to the good stuff: variables. A variable lets us save the result of an expression and refer to it later. In Scheme, a variable can also be reset to new values, so that its value changes as the program executes. This presents a complication for Haskell, because the execution model is built upon functions that return values, but never change them.
Nevertheless, there are several ways to simulate state in Haskell, all involving monads. The simplest is probably the state monad, which lets you hide arbitrary state within the monad and pass it around behind the scenes. You specify the state type as a parameter to the monad (e.g. if a function returns an integer but modifies a list of string pairs, it would have type
State [(String, String)] Integer), and access it via the get and put functions, usually within a do-block. You'd specify the initial state via the
runState myStateAction initialList, which returns a pair containing the return value and the final state.
Unfortunately, the state monad doesn't work well for us, because the type of data we need to store is fairly complex. For a simple top-level environment, we could get away with
[(String, LispVal)], storing mappings from variable names to values. However, when we start dealing with function calls, these mappings become a stack of nested environments, arbitrarily deep. And when we add closures, environments might get saved in an arbitrary Function value, and passed around throughout the program. In fact, they might be saved in a variable and passed out of the
runState monad entirely, something we're not allowed to do.
Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables. There are two flavors of state threads: the
ST monad creates a stateful computation that can be executed as a unit, without the state escaping to the rest of the program. The
IORef module lets you use stateful variables within the IO monad. Since our state has to be interleaved with IO anyway (it persists between lines in the REPL, and we will eventually have IO functions within the language itself), we'll be using
We can start out by importing
Data.IORef and defining a type for our environments:
import Data.IORef type Env = IORef [(String, IORef LispVal)]
This declares an
Env as an
IORef holding a list that maps
Strings to mutable
LispVals. We need
IORefs for both the list itself and for individual values because there are two ways that the program can mutate the environment. It might use
set! to change the value of an individual variable, a change visible to any function that shares that environment (Scheme allows nested scopes, so a variable in an outer scope is visible to all inner scopes). Or it might use
define to add a new variable, which should be visible on all subsequent statements.
IORefs can only be used within the IO monad, we'll want a helper action to create an empty environment. We can't just use the empty list
 because all accesses to
IORefs must be sequenced, and so the type of our null environment is
IO Env instead of just plain
nullEnv :: IO Env nullEnv = newIORef 
From here, things get a bit more complicated, because we'll be simultaneously dealing with two monads. Remember, we also need an
Error monad to handle things like unbound variables. The parts that need IO functionality and the parts that may throw exceptions are interleaved, so we can't just catch all the exceptions and return only normal values to the IO monad.
Haskell provides a mechanism known as monad transformers that lets you combine the functionality of multiple monads. We'll be using one of these,
ErrorT, which lets us layer error-handling functionality on top of the IO monad. Our first step is to create a type synonym for our combined monad:
type IOThrowsError = ErrorT LispError IO
IOThrowsError is really a type constructor: we've left off the last argument, the return type of the function. However,
ErrorT takes one more argument than plain old
Either: we have to specify the type of monad that we're layering our error-handling functionality over. We've created a monad that may contain IO actions that throw a
We have a mix of
IOThrowsError functions, but actions of different types cannot be contained within the same do-block, even if they provide essentially the same functionality. Haskell already provides a mechanism, lifting, to bring values of the lower type (
IO) into the combined monad. Unfortunately, there's no similar support to bring a value of the untransformed upper type into the combined monad, so we need to write it ourselves:
liftThrows :: ThrowsError a -> IOThrowsError a liftThrows (Left err) = throwError err liftThrows (Right val) = return val
This destructures the
Either type and either re-throws the error type or returns the ordinary value. Methods in typeclasses resolve based on the type of the expression, so
return (members of
Monad, respectively) take on their
IOThrowsError definitions. Incidentally, the type signature provided here is not fully general: if we'd left it off, the compiler would have inferred
liftThrows :: (MonadError m a) => Either e a -> m a.
We'll also want a helper function to run the whole top-level
IOThrowsError action, returning an IO action. We can't escape from the IO monad, because a function that performs IO has an effect on the outside world, and you don't want that in a lazily-evaluated pure function. But you can run the error computation and catch the errors.
runIOThrows :: IOThrowsError String -> IO String runIOThrows action = runErrorT (trapError action) >>= return . extractValue
This uses our previously-defined
trapError function to take any error values and convert them to their string representations, then runs the whole computation via
runErrorT. The result is passed into
extractValue and returned as a value in the IO monad.
Now we're ready to return to environment handling. We'll start with a function to determine if a given variable is already bound in the environment, necessary for proper handling of
isBound :: Env -> String -> IO Bool isBound envRef var = readIORef envRef >>= return . maybe False (const True) . lookup var
This first extracts the actual environment value from its
readIORef. Then we pass it to lookup to search for the particular variable we're interested in. lookup returns a
Maybe value, so we return
False if that value was
True otherwise (we need to use the
const function because
maybe expects a function to perform on the result and not just a value). Finally, we use return to lift that value into the IO monad. Since we're just interested in a true/false value, we don't need to deal with the actual
IORef that lookup returns.
Next, we'll want to define a function to retrieve the current value of a variable:
getVar :: Env -> String -> IOThrowsError LispVal getVar envRef var = do env <- liftIO $ readIORef envRef maybe (throwError $ UnboundVar "Getting an unbound variable" var) (liftIO . readIORef) (lookup var env)
Like the previous function, this begins by retrieving the actual environment from the
getVar uses the
IOThrowsError monad, because it also needs to do some error handling. As a result, we need to use the
liftIO function to lift the
readIORef action into the combined monad. Similarly, when we return the value, we use
liftIO . readIORef to generate an
IOThrowsError action that reads the returned
IORef. We don't need to use
liftIO to throw an error, however, because
throwError is a defined for the
MonadError typeclass, of which
ErrorT is an instance.
Now we create a function to set values:
setVar :: Env -> String -> LispVal -> IOThrowsError LispVal setVar envRef var value = do env <- liftIO $ readIORef envRef maybe (throwError $ UnboundVar "Setting an unbound variable" var) (liftIO . (flip writeIORef value)) (lookup var env) return value
Again, we first read the environment out of its
IORef and run a lookup on it. This time, however, we want to change the variable instead of just reading it. The
writeIORef action provides a means for this, but takes its arguments in the wrong order (ref -> value instead of value -> ref). So we use the built-in function
flip to switch the arguments of
writeIORef around, and then pass it the value. Finally, we return the value we just set, for convenience.
We'll want a function to handle the special behavior of
define, which sets a variable if already bound or creates a new one if not. Since we've already defined a function to set values, we can use it in the former case:
defineVar :: Env -> String -> LispVal -> IOThrowsError LispVal defineVar envRef var value = do alreadyDefined <- liftIO $ isBound envRef var if alreadyDefined then setVar envRef var value >> return value else liftIO $ do valueRef <- newIORef value env <- readIORef envRef writeIORef envRef ((var, valueRef) : env) return value
It's the latter case that's interesting, where the variable is unbound. We create an IO action (via do-notation) that creates a new
IORef to hold the new variable, reads the current value of the environment, then writes a new list back to that variable consisting of the new (key, variable) pair added to the front of the list. Then we lift that whole do-block into the
IOThrowsError monad with
There's one more useful environment function: being able to bind a whole bunch of variables at once, as happens when a function is invoked. We might as well build that functionality now, though we won't be using it until the next section:
bindVars :: Env -> [(String, LispVal)] -> IO Env bindVars envRef bindings = readIORef envRef >>= extendEnv bindings >>= newIORef where extendEnv bindings env = liftM (++ env) (mapM addBinding bindings) addBinding (var, value) = do ref <- newIORef value return (var, ref)
This is perhaps more complicated than the other functions, since it uses a monadic pipeline (rather than do-notation) and a pair of helper functions to do the work. It's best to start with the helper functions. addBinding takes a variable name and value, creates an
IORef to hold the new variable, and then returns the name–value pair.
addBinding on each member of bindings (
mapM) to create a list of
(String, IORef LispVal) pairs, and then appends the current environment to the end of that
(++ env). Finally, the whole function wires these functions up in a pipeline, starting by reading the existing environment out of its
IORef, then passing the result to
extendEnv, then returning a new
IORef with the extended environment.
Now that we have all our environment functions, we need to start using them in the evaluator. Since Haskell has no global variables, we'll have to thread the environment through the evaluator as a parameter. While we're at it, we might as well add the
define special forms.
eval :: Env -> LispVal -> IOThrowsError LispVal eval env val@(String _) = return val eval env val@(Number _) = return val eval env val@(Bool _) = return val eval env (Atom id) = getVar env id eval env (List [Atom "quote", val]) = return val eval env (List [Atom "if", pred, conseq, alt]) = do result <- eval env pred case result of Bool False -> eval env alt otherwise -> eval env conseq eval env (List [Atom "set!", Atom var, form]) = eval env form >>= setVar env var eval env (List [Atom "define", Atom var, form]) = eval env form >>= defineVar env var eval env (List (Atom func : args)) = mapM (eval env) args >>= liftThrows . apply func eval env badForm = throwError $ BadSpecialForm "Unrecognized special form" badForm
Since a single environment gets threaded through a whole interactive session, we need to change a few of our IO functions to take an environment.
evalAndPrint :: Env -> String -> IO () evalAndPrint env expr = evalString env expr >>= putStrLn evalString :: Env -> String -> IO String evalString env expr = runIOThrows $ liftM show $ (liftThrows $ readExpr expr) >>= eval env
We need the
runIOThrows in evalString because the type of the monad has changed from
IOThrowsError. Similarly, we need a
liftThrows to bring
readExpr into the
Next, we initialize the environment with a null variable before starting the program:
runOne :: String -> IO () runOne expr = nullEnv >>= flip evalAndPrint expr runRepl :: IO () runRepl = nullEnv >>= until_ (== "quit") (readPrompt "Lisp>>> ") . evalAndPrint
We've created an additional helper function runOne to handle the single-expression case, since it's now somewhat more involved than just running
evalAndPrint. The changes to
runRepl are a bit more subtle: notice how we added a function composition operator before evalAndPrint. That's because
evalAndPrint now takes an additional
Env parameter, fed from
nullEnv. The function composition tells
until_ that instead of taking plain old
evalAndPrint as an action, it ought to apply it first to whatever's coming down the monadic pipeline, in this case the result of
nullEnv. Thus, the actual function that gets applied to each line of input is
(evalAndPrint env), just as we want it.
Finally, we need to change our main function to call
runOne instead of evaluating
main :: IO () main = do args <- getArgs case length args of 0 -> runRepl 1 -> runOne $ args !! 0 otherwise -> putStrLn "Program takes only 0 or 1 argument"
And we can compile and test our program:
$ ghc -package parsec -o lisp [../code/listing8.hs listing8.hs] $ ./lisp Lisp>>> (define x 3) 3 Lisp>>> (+ x 2) 5 Lisp>>> (+ y 2) Getting an unbound variable: y Lisp>>> (define y 5) 5 Lisp>>> (+ x (- y 2)) 6 Lisp>>> (define str "A string") "A string" Lisp>>> (< str "The string") Invalid type: expected number, found "A string" Lisp>>> (string<? str "The string") #t