# Haskell/YAHT/Monads

The most difficult concept to master while learning Haskell is that of understanding and using monads. We can distinguish two subcomponents here: (1) learning how to use existing monads and (2) learning how to write new ones. If you want to use Haskell, you must learn to use existing monads. On the other hand, you will only need to learn to write your own monads if you want to become a "super Haskell guru." Still, if you can grasp writing your own monads, programming in Haskell will be much more pleasant.

So far we've seen two uses of monads. The first use was IO actions: We've seen that, by using monads, we can get away from the problems plaguing the RealWorld solution to IO presented in the chapter IO. The second use was representing different types of computations in the section on Classes-computations. In both cases, we needed a way to sequence operations and saw that a sufficient definition (at least for computations) was:

class Computation c where success :: a -> c a failure :: String -> c a augment :: c a -> (a -> c b) -> c b combine :: c a -> c a -> c a

Let's see if this definition will enable us to also perform IO. Essentially, we need a way to represent taking a value out of an action and performing some new operation on it (as in the example from the section on Functions-io, rephrased slightly):

main = do s <- readFile "somefile" putStrLn (show (f s))

But this is exactly what `augment`

does. Using `augment`

, we can write the above code as:

main = -- note the lack of a "do" readFile "somefile" `augment` \s -> putStrLn (show (f s))

This certainly seems to be sufficient. And, in fact, it turns out to be more than sufficient.

The definition of a monad is a slightly trimmed-down version of our `Computation`

class. The `Monad`

class has four methods (but the fourth method can be defined in terms of the third):

class Monad m where return :: a -> m a fail :: String -> m a (>>=) :: m a -> (a -> m b) -> m b (>>) :: m a -> m b -> m b

In this definition, `return`

is equivalent to our `success`

; `fail`

is equivalent to our `failure`

; and `>>=`

(read: "bind" ) is equivalent to our `augment`

. The `>>`

(read: "then" ) method is simply a version of `>>=`

that ignores the `a`

. This will turn out to be useful; although, as mentioned before, it can be defined in terms of `>>=`

:

a >> x = a >>= \_ -> x

## Contents

## Do Notation[edit]

We have hinted that there is a connection between monads and the ` do` notation. Here, we make that relationship concrete. There is actually nothing magic about the

`notation – it is simply "syntactic sugar" for monadic operations.`

**do**As we mentioned earlier, using our `Computation`

class, we could define our above program as:

main = readFile "somefile" `augment` \s -> putStrLn (show (f s))

But we now know that `augment`

is called `>>=`

in the monadic world. Thus, this program really reads:

main = readFile "somefile" >>= \s -> putStrLn (show (f s))

And this is completely valid Haskell at this point: if you defined a function `f :: Show a => String -> a`

, you could compile and run this program)

This suggests that we can translate:

x <- f g x

into `f >>= \x -> g x`

. This is exactly what the compiler does. Talking about ` do` becomes easier if we do not use implicit layout (see the section on Layout for how to do this). There are four translation rules:

`do {e}`

→`e`

`do {e; es}`

→`e >> do {es}`

`do {let decls; es}`

→`let decls in do {es}`

`do {p <- e; es}`

→`let ok p = do {es} ; ok _ = fail "..." in e >>= ok`

Again, we will elaborate on these one at a time:

### Translation Rule 1[edit]

The first translation rule, `do {e}`

→ `e`

, states (as we have stated before) that when performing a single action, having a ` do` or not is irrelevant. This is essentially the base case for an inductive definition of

`. The base case has one action (namely`

**do**`e`

here); the other three translation rules handle the cases where there is more than one action.### Translation Rule 2[edit]

This states that `do {e; es}`

→ `e >> do {es}`

. This tells us what to do if we have an action (`e`

) followed by a list of actions (`es`

). Here, we make use of the `>>`

function, defined earlier. This rule simply states that to **do**`{e; es}`

, we first perform the action `e`

, throw away the result, and then **do**`es`

.

For instance, if `e`

is `putStrLn s`

for some string `s`

, then the translation of `do {e; es}`

is to perform `e`

(i.e., print the string) and then **do**`es`

. This is clearly what we want.

### Translation Rule 3[edit]

This states that `do {let decls; es}`

→ `let decls in do {es}`

. This rule tells us how to deal with ` let`s inside of a

`statement. We lift the declarations within the`

**do**`out and`

**let**`whatever comes after the declarations.`

**do**### Translation Rule 4[edit]

This states that `do {p <- e; es}`

→ `let ok p = do {es} ; ok _ = fail "..." in e >>= ok`

. Again, it is not exactly obvious what is going on here. However, an alternate formulation of this rule, which is roughly equivalent, is: `do {p <- e; es}`

→ `e >>= \p -> es`

. Here, it is clear what is happening. We run the action `e`

, and then send the results into `es`

, but first give the result the name `p`

.

The reason for the complex definition is that `p`

doesn't need to simply be a variable; it could be some complex pattern. For instance, the following is valid code:

foo = do ('a':'b':'c':x:xs) <- getLine putStrLn (x:xs)

In this, we're assuming that the results of the action `getLine`

will begin with the string "abc" and will have at least one more character. The question becomes what should happen if this pattern match fails. The compiler could simply throw an error, like usual, for failed pattern matches. However, since we're within a monad, we have access to a special `fail`

function, and we'd prefer to fail using that function, rather than the "catch all" `error`

function. Thus, the translation, as defined, allows the compiler to fill in the `...`

with an appropriate error message about the pattern matching having failed. Apart from this, the two definitions are equivalent.

## Definition[edit]

There are three rules that all monads must obey called the "Monad Laws" (and it is up to *you* to ensure that your monads obey these rules) :

`return a >>= f`

≡`f a`

`f >>= return`

≡`f`

`f >>= (\x -> g x >>= h)`

≡`(f >>= g) >>= h`

Let's look at each of these individually:

### Law 1[edit]

This states that `return a >>= f`

≡ `f a`

. Suppose we think about monads as computations. This means that if we create a trivial computation that simply returns the value `a`

regardless of anything else (this is the `return a`

part); and then bind it together with some other computation `f`

, then this is equivalent to simply performing the computation `f`

on `a`

directly.

For example, suppose `f`

is the function `putStrLn`

and `a`

is the string "Hello World." This rule states that binding a computation whose result is "Hello World" to `putStrLn`

is the same as simply printing it to the screen. This seems to make sense.

In ` do` notation, this law states that the following two programs are equivalent:

law1a = do x <- return a f x law1b = do f a

### Law 2[edit]

The second monad law states that `f >>= return`

≡ `f`

for some computation `f`

. In other words, the law states that if we perform the computation `f`

and then pass the result on to the trivial `return`

function, then all we have done is to perform the computation.

That this law must hold should be obvious. To see this, think of `f`

as `getLine`

(reads a string from the keyboard). This law states that reading a string and then returning the value read is exactly the same as just reading the string.

In ` do` notation, the law states that the following two programs are equivalent:

law2a = do x <- f return x law2b = do f

### Law 3[edit]

This states that `f >>= (\x -> g x >>= h)`

≡ `(f >>= g) >>= h`

. At first glance, this law is not as easy to grasp as the other two. It is essentially an associativity law for monads.

*Note*

Outside the world of monads, a function is associative if . For instance, `+`

and `*`

are associative, since bracketing on these functions doesn't make a difference. On the other hand, `-`

and `/`

are not associative since, for example, .

If we throw away the messiness with the lambdas, we see that this law states: `f >>= (g >>= h)`

≡ `(f >>= g) >>= h`

. The intuition behind this law is that when we string together actions, it doesn't matter how we group them.

For a concrete example, take `f`

to be `getLine`

. Take `g`

to be an action which takes a value as input, prints it to the screen, reads another string via `getLine`

, and then returns that newly read string. Take `h`

to be `putStrLn`

.

Let's consider what `(\x -> g x >>= h)`

does. It takes a value called `x`

, and runs `g`

on it, feeding the results into `h`

. In this instance, this means that it's going to take a value, print it, read another value and then print that. Thus, the entire left hand side of the law first reads a string and then does what we've just described.

On the other hand, consider `(f >>= g)`

. This action reads a string from the keyboard, prints it, and then reads another string, returning that newly read string as a result. When we bind this with `h`

as on the right hand side of the law, we get an action that does the action described by `(f >>= g)`

, and then prints the results.

Clearly, these two actions are the same.

While this explanation is quite complicated, and the text of the law is also quite complicated, the actual meaning is simple: if we have three actions, and we compose them in the same order, it doesn't matter where we put the parentheses. The rest is just notation.

In ` do` notation, the law says that the following two programs are equivalent:

law3a = do x <- f do y <- g x h y law3b = do y <- do x <- f g x h y

## A Simple State Monad[edit]

One of the simplest monads that we can craft is a state-passing monad. In Haskell, all state information usually must be passed to functions explicitly as arguments. Using monads, we can effectively hide some state information.

Suppose we have a function `f`

of type `a -> b`

, and we need to add state to this function. In general, if state is of type `state`

, we can encode it by changing the type of `f`

to `a -> state -> (state, b)`

. That is, the new version of `f`

takes the original parameter of type `a`

and a new state parameter. And, in addition to returning the value of type `b`

, it also returns an updated state, encoded in a tuple.

For instance, suppose we have a binary tree defined as:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

Now, we can write a simple map function to apply some function to each value in the leaves:

mapTree :: (a -> b) -> Tree a -> Tree b mapTree f (Leaf a) = Leaf (f a) mapTree f (Branch lhs rhs) = Branch (mapTree f lhs) (mapTree f rhs)

This works fine until we need to write a function that numbers the leaves left to right. In a sense, we need to add state, which keeps track of how many leaves we've numbered so far, to the `mapTree`

function. We can augment the function to something like:

mapTreeState :: (a -> state -> (state, b)) -> Tree a -> state -> (state, Tree b) mapTreeState f (Leaf a) state = let (state', b) = f a state in (state', Leaf b) mapTreeState f (Branch lhs rhs) state = let (state' , lhs') = mapTreeState f lhs state (state'', rhs') = mapTreeState f rhs state' in (state'', Branch lhs' rhs')

This is beginning to get a bit unwieldy, and the type signature is getting harder and harder to understand. What we want to do is abstract away the state passing part. That is, the differences between `mapTree`

and `mapTreeState`

are: (1) the augmented `f`

type, (2) we replaced the type `-> Tree b`

with `-> state -> (state, Tree b)`

. Notice that both types changed in exactly the same way. We can abstract this away with a type synonym declaration:

type State st a = st -> (st, a)

To go along with this type, we write two functions:

returnState :: a -> State st a returnState a = \st -> (st, a) bindState :: State st a -> (a -> State st b) -> State st b bindState m k = \st -> let (st', a) = m st m' = k a in m' st'

Let's examine each of these in turn. The first function, `returnState`

, takes a value of type `a`

and creates something of type `State st a`

. If we think of the `st`

as the state, and the value of type `a`

as the value, then this is a function that doesn't change the state and returns the value `a`

.

The `bindState`

function looks distinctly like the interior ` let` declarations in

`mapTreeState`

. It takes two arguments. The first argument is an action that returns something of type `a`

with state `st`

. The second is a function that takes this `a`

and produces something of type `b`

also with the same state. The result of `bindState`

is essentially the result of transforming the `a`

into a `b`

.The definition of `bindState`

takes an initial state, `st`

. It first applies this to the `State st a`

argument called `m`

. This gives back a new state `st'`

and a value `a`

. It then lets the function `k`

act on `a`

, producing something of type `State st b`

, called `m'`

. We finally run `m'`

with the new state `st'`

.

We write a new function, `mapTreeStateM`

and give it the type:

mapTreeStateM :: (a -> State st b) -> Tree a -> State st (Tree b)

Using these "plumbing" functions (`returnState`

and `bindState`

) we can write this function without ever having to explicitly talk about the state:

mapTreeStateM f (Leaf a) = f a `bindState` \b -> returnState (Leaf b) mapTreeStateM f (Branch lhs rhs) = mapTreeStateM f lhs `bindState` \lhs' -> mapTreeStateM f rhs `bindState` \rhs' -> returnState (Branch lhs' rhs')

In the `Leaf`

case, we apply `f`

to `a`

and then *bind* the result to a function that takes the result and returns a `Leaf`

with the new value.

In the `Branch`

case, we recurse on the left-hand-side, binding the result to a function that recurses on the right-hand-side, binding that to a simple function that returns the newly created `Branch`

.

As you have probably guessed by this point, `State st`

is a monad, `returnState`

is analogous to the overloaded `return`

method, and `bindState`

is analogous to the overloaded `>>=`

method. In fact, we can verify that `State st a`

obeys the monad laws:

*Law 1* states: `return a >>= f`

≡ `f a`

. Let's calculate on the left hand side, substituting our names:

returnState a `bindState` f ==> \st -> let (st', a) = (returnState a) st m' = f a in m' st' ==> \st -> let (st', a) = (\st -> (st, a)) st in (f a) st' ==> \st -> let (st', a) = (st, a) in (f a) st' ==> \st -> (f a) st ==> f a

In the first step, we simply substitute the definition of `bindState`

. In the second step, we simplify the last two lines and substitute the definition of `returnState`

. In the third step, we apply `st`

to the lambda function. In the fourth step, we rename `st'`

to `st`

and remove the ` let`. In the last step, we eta reduce.

Moving on to *Law 2*, we need to show that `f >>= return`

≡ `f`

. This is shown as follows:

f `bindState` returnState ==> \st -> let (st', a) = f st in (returnState a) st' ==> \st -> let (st', a) = f st in (\st -> (st, a)) st' ==> \st -> let (st', a) = f st in (st', a) ==> \st -> f st ==> f

Finally, we need to show that `State`

obeys the third law: `f >>= (\x -> g x >>= h)`

≡ `(f >>= g) >>= h`

. This is much more involved to show, so we will only sketch the proof here. Notice that we can write the left-hand-side as:

\st -> let (st', a) = f st in (\x -> g x `bindState` h) a st' ==> \st -> let (st', a) = f st in (g a `bindState` h) st' ==> \st -> let (st', a) = f st in (\st' -> let (st'', b) = g a in h b st'') st' ==> \st -> let (st' , a) = f st (st'', b) = g a st' (st''',c) = h b st'' in (st''',c)

The interesting thing to note here is that we have both action applications on the same ` let` level. Since

`is associative, this means that we can put whichever bracketing we prefer and the results will not change. Of course, this is an informal, "hand waving" argument and it would take us a few more derivations to actually prove, but this gives the general idea.`

**let**Now that we know that `State st`

is actually a monad, we'd like to make it an instance of the `Monad`

class. Unfortunately, the straightforward way of doing this doesn't work. We can't write:

instance Monad (State st) where { ... }

This is because you cannot make instances out of non-fully-applied type synonyms. Instead, what we need to do instead is convert the type synonym into a ` newtype`, as:

newtype State st a = State (st -> (st, a))

Unfortunately, this means that we need to do some packing and unpacking of the `State`

constructor in the `Monad`

instance declaration, but it's not terribly difficult:

instance Monad (State state) where return a = State (\state -> (state, a)) State run >>= action = State run' where run' st = let (st', a) = run st State run'' = action a in run'' st'

Now, we can write our `mapTreeM`

function as:

mapTreeM :: (a -> State state b) -> Tree a -> State state (Tree b) mapTreeM f (Leaf a) = do b <- f a return (Leaf b) mapTreeM f (Branch lhs rhs) = do lhs' <- mapTreeM f lhs rhs' <- mapTreeM f rhs return (Branch lhs' rhs')

which is significantly cleaner than before. In fact, if we remove the type signature, we get the more general type:

mapTreeM :: Monad m => (a -> m b) -> Tree a -> m (Tree b)

That is, `mapTreeM`

can be run in *any* monad, not just our `State`

monad.

Now, the nice thing about encapsulating the stateful aspect of the computation like this is that we can provide functions to get and change the current state. These look like:

getState :: State state state getState = State (\state -> (state, state)) putState :: state -> State state () putState new = State (\_ -> (new, ()))

Here, `getState`

is a monadic operation that takes the current state, passes it through unchanged, and then returns it as the value. The `putState`

function takes a new state and produces an action that ignores the current state and inserts the new one.

Now, we can write our `numberTree`

function as:

numberTree :: Tree a -> State Int (Tree (a, Int)) numberTree tree = mapTreeM number tree where number v = do cur <- getState putState (cur+1) return (v,cur)

Finally, we need to be able to run the action by providing an initial state:

runStateM :: State state a -> state -> a runStateM (State f) st = snd (f st)

Now, we can provide an example Tree:

testTree = Branch (Branch (Leaf 'a') (Branch (Leaf 'b') (Leaf 'c'))) (Branch (Leaf 'd') (Leaf 'e'))

and number it:

**Example:**

State> runStateM (numberTree testTree) 1 Branch (Branch (Leaf ('a',1)) (Branch (Leaf ('b',2)) (Leaf ('c',3)))) (Branch (Leaf ('d',4)) (Leaf ('e',5)))

This may seem like a large amount of work to do something simple. However, note the new power of `mapTreeM`

. We can also print out the leaves of the tree in a left-to-right fashion as:

**Example:**

State> mapTreeM print testTree 'a' 'b' 'c' 'd' 'e'

This crucially relies on the fact that `mapTreeM`

has the more general type involving arbitrary monads -- not just the state monad. Furthermore, we can write an action that will make each leaf value equal to its old value as well as all the values preceding:

fluffLeaves tree = mapTreeM fluff tree where fluff v = do cur <- getState putState (v:cur) return (v:cur)

and can see it in action:

**Example:**

State> runStateM (fluffLeaves testTree) [] Branch (Branch (Leaf "a") (Branch (Leaf "ba") (Leaf "cba"))) (Branch (Leaf "dcba") (Leaf "edcba"))

In fact, you don't even need to write your own monad instance and datatype. All this is built in to the `Control.Monad.State`

module. There, our `runStateM`

is called `evalState`

; our `getState`

is called `get`

; and our `putState`

is called `put`

.

This module also contains a *state transformer monad*, which we will discuss in the section on Transformer.

## Common Monads[edit]

It turns out that many of our favorite datatypes are actually monads themselves. Consider, for instance, lists. They have a monad definition that looks something like:

instance Monad [] where return x = [x] l >>= f = concatMap f l fail _ = []

This enables us to use lists in do notation. For instance, given the definition:

cross l1 l2 = do x <- l1 y <- l2 return (x,y)

we get a cross-product function:

**Example:**

Monads> cross "ab" "def" [('a','d'),('a','e'),('a','f'),('b','d'),('b','e'), ('b','f')]

It is not a coincidence that this looks very much like the list comprehension form:

**Example:**

Prelude> [(x,y) | x <- "ab", y <- "def"] [('a','d'),('a','e'),('a','f'),('b','d'),('b','e'), ('b','f')]

List comprehension form is simply an abbreviated form of a monadic statement using lists. In fact, in older versions of Haskell, the list comprehension form could be used for *any* monad -- not just lists. However, in the current version of Haskell, this is no longer allowed.

The `Maybe`

type is also a monad, with failure being represented as `Nothing`

and with success as `Just`

. We get the following instance declaration:

instance Monad Maybe where return a = Just a Nothing >>= f = Nothing Just x >>= f = f x fail _ = Nothing

We can use the *same* cross product function that we did for lists on `Maybe`

s. This is because the ` do` notation works for any monad, and there's nothing specific to lists about the

`cross`

function.**Example:**

Monads> cross (Just 'a') (Just 'b') Just ('a','b') Monads> cross (Nothing :: Maybe Char) (Just 'b') Nothing Monads> cross (Just 'a') (Nothing :: Maybe Char) Nothing Monads> cross (Nothing :: Maybe Char) (Nothing :: Maybe Char) Nothing

What this means is that if we write a function (like `searchAll`

from the section on Classes) only in terms of monadic operators, we can use it with any monad, depending on what we mean. Using real monadic functions (not ` do` notation), the

`searchAll`

function looks something like:searchAll g@(Graph vl el) src dst | src == dst = return [src] | otherwise = search' el where search' [] = fail "no path" search' ((u,v,_):es) | src == u = searchAll g v dst >>= \path -> return (u:path) | otherwise = search' es

The type of this function is `Monad m => Graph v e -> Int -> Int -> m [Int]`

. This means that no matter what monad we're using at the moment, this function will perform the calculation. Suppose we have the following graph:

gr = Graph [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')] [(0,1,'l'), (0,2,'m'), (1,3,'n'), (2,3,'m')]

This represents a graph with four nodes, labelled ` a,b,c` and

`. There is an edge from`

**d**`to both`

**a**`and`

**b**`. There is also an edge from both`

**c**`and`

**b**`to`

**c**`. Using the`

**d**`Maybe`

monad, we can compute the path from `to`

**a**`:`

**d****Example:**

Monads> searchAll gr 0 3 :: Maybe [Int] Just [0,1,3]

We provide the type signature, so that the interpreter knows what monad we're using. If we try to search in the opposite direction, there is no path. The inability to find a path is represented as `Nothing`

in the `Maybe`

monad:

**Example:**

Monads> searchAll gr 3 0 :: Maybe [Int] Nothing

Note that the string "no path" has disappeared since there's no way for the `Maybe`

monad to record this.

If we perform the same impossible search in the list monad, we get the empty list, indicating no path:

**Example:**

Monads> searchAll gr 3 0 :: [[Int]] []

If we perform the possible search, we get back a list containing the first path:

**Example:**

Monads> searchAll gr 0 3 :: [[Int]] [[0,1,3]]

You may have expected this function call to return *all* paths, but, as coded, it does not. See the section on Plus for more about using lists to represent nondeterminism.

If we use the IO monad, we can actually get at the error message, since IO knows how to keep track of error messages:

**Example:**

Monads> searchAll gr 0 3 :: IO [Int] Monads> it [0,1,3] Monads> searchAll gr 3 0 :: IO [Int] *** Exception: user error Reason: no path

In the first case, we needed to type `it`

to get GHCi to actually evaluate the search.

There is one problem with this implementation of `searchAll`

: if it finds an edge that does not lead to a solution, it won't be able to backtrack. This has to do with the recursive call to `searchAll`

inside of `search'`

. Consider, for instance, what happens if `searchAll g v dst`

doesn't find a path. There's no way for this implementation to recover. For instance, if we remove the edge from node {\bf b} to node {\bf d}, we should still be able to find a path from {\bf a} to {\bf d}, but this algorithm can't find it. We define:

gr2 = Graph [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')] [(0,1,'l'), (0,2,'m'), (2,3,'m')]

and then try to search:

**Example:**

Monads> searchAll gr2 0 3 *** Exception: user error Reason: no path

To fix this, we need a function like `combine`

from our `Computation`

class. We will see how to do this in the section on Plus.

Exercises |
---|

Verify that `Maybe` obeys the three monad laws. |

Exercises |
---|

The type Hint: Your instance declaration should begin: `(Either String) where` . |

## Monadic Combinators[edit]

The `Monad`

/`Control.Monad`

library contains a few very useful monadic combinators, which haven't yet been thoroughly discussed. The ones we will discuss in this section, together with their types, are:

`(=<<) :: (a -> m b) -> m a -> m b`

`mapM :: (a -> m b) -> [a] -> m [b]`

`mapM_ :: (a -> m b) -> [a] -> m ()`

`filterM :: (a -> m Bool) -> [a] -> m [a]`

`foldM :: (a -> b -> m a) -> a -> [b] -> m a`

`sequence :: [m a] -> m [a]`

`sequence_ :: [m a] -> m ()`

`liftM :: (a -> b) -> m a -> m b`

`when :: Bool -> m () -> m ()`

`join :: m (m a) -> m a`

In the above, `m`

is always assumed to be an instance of `Monad`

.

In general, functions with an underscore at the end are equivalent to the ones without, except that they do not return any value.

The `=<<`

function is exactly the same as `>>=`

, except it takes its arguments in the opposite order. For instance, in the IO monad, we can write either of the following:

**Example:**

Monads> writeFile "foo" "hello world!" >> (readFile "foo" >>= putStrLn) hello world! Monads> writeFile "foo" "hello world!" >> (putStrLn =<< readFile "foo") hello world!

The `mapM`

, `filterM`

and `foldM`

are our old friends `map`

, `filter`

and `foldl`

wrapped up inside of monads. These functions are incredibly useful (particularly `foldM`

) when working with monads. We can use `mapM_`

, for instance, to print a list of things to the screen:

**Example:**

Monads> mapM_ print [1,2,3,4,5] 1 2 3 4 5

We can use `foldM`

to sum a list and print the intermediate sum at each step:

**Example:**

Monads> foldM (\a b -> putStrLn (show a ++ "+" ++ show b ++ "=" ++ show (a+b)) >> return (a+b)) 0 [1..5] 0+1=1 1+2=3 3+3=6 6+4=10 10+5=15 Monads> it 15

The `sequence`

and `sequence_`

functions simply "execute" a list of actions. For instance:

**Example:**

Monads> sequence [print 1, print 2, print 'a'] 1 2 'a' Monads> it [(),(),()] Monads> sequence_ [print 1, print 2, print 'a'] 1 2 'a' Monads> it ()

We can see that the underscored version doesn't return each value, while the non-underscored version returns the list of the return values.

The `liftM`

function "lifts" a non-monadic function to a monadic function. (Do not confuse this with the `lift`

function used for monad transformers in the section on Transformer.) This is useful for shortening code (among other things). For instance, we might want to write a function that prepends each line in a file with its line number. We can do this with:

numberFile :: FilePath -> IO () numberFile fp = do text <- readFile fp let l = lines text let n = zipWith (\n t -> show n ++ ' ' : t) [1..] l mapM_ putStrLn n

However, we can shorten this using `liftM`

:

numberFile :: FilePath -> IO () numberFile fp = do l <- lines `liftM` readFile fp let n = zipWith (\n t -> show n ++ ' ' : t) [1..] l mapM_ putStrLn n

In fact, you can apply any sort of (pure) processing to a file using `liftM`

. For instance, perhaps we also want to split lines into words; we can do this with:

... w <- (map words . lines) `liftM` readFile fp ...

Note that the parentheses are required, since the `(.)`

function has the same fixity has ``liftM``

.

Lifting pure functions into monads is also useful in other monads. For instance `liftM`

can be used to apply function inside of `Just`

. For instance:

Monads> liftM (+1) (Just 5) Just 6 Monads> liftM (+1) Nothing Nothing

The `when`

function executes a monadic action only if a condition is met. So, if we only want to print non-empty lines:

**Example:**

Monads> mapM_ (\l -> when (not $ null l) (putStrLn l)) ["","abc","def","","","ghi"] abc def ghi

Of course, the same could be accomplished with `filter`

, but sometimes `when`

is more convenient.

Finally, the `join`

function is the monadic equivalent of `concat`

on lists. In fact, when `m`

is the list monad, `join`

is exactly `concat`

. In other monads, it accomplishes a similar task:

**Example:**

Monads> join (Just (Just 'a')) Just 'a' Monads> join (Just (Nothing :: Maybe Char)) Nothing Monads> join (Nothing :: Maybe (Maybe Char)) Nothing Monads> join (return (putStrLn "hello")) hello Monads> return (putStrLn "hello") Monads> join [[1,2,3],[4,5]] [1,2,3,4,5]

These functions will turn out to be even more useful as we move on to more advanced topics in the chapter Io advanced.

## MonadPlus[edit]

Given only the `>>=`

and `return`

functions, it is impossible to write a function like `combine`

with type `c a -> c a -> c a`

. However, such a function is so generally useful that it exists in another class called `MonadPlus`

. In addition to having a `combine`

function, instances of `MonadPlus`

also have a "zero" element that is the identity under the "plus" (i.e., combine) action. The definition is:

class Monad m => MonadPlus m where mzero :: m a mplus :: m a -> m a -> m a

In order to gain access to `MonadPlus`

, you need to import the `Monad`

module (or `Control.Monad`

in the hierarchical libraries).

In the section on Common, we showed that `Maybe`

and list are both monads. In fact, they are also both instances of `MonadPlus`

. In the case of `Maybe`

, the zero element is `Nothing`

; in the case of lists, it is the empty list. The `mplus`

operation on `Maybe`

is `Nothing`

, if both elements are `Nothing`

; otherwise, it is the first `Just`

value. For lists, `mplus`

is the same as `++`

.

That is, the instance declarations look like:

instance MonadPlus Maybe where mzero = Nothing mplus Nothing y = y mplus x _ = x instance MonadPlus [] where mzero = [] mplus x y = x ++ y

We can use this class to reimplement the search function we've been exploring, such that it will explore all possible paths. The new function looks like:

searchAll2 g@(Graph vl el) src dst | src == dst = return [src] | otherwise = search' el where search' [] = fail "no path" search' ((u,v,_):es) | src == u = (searchAll2 g v dst >>= \path -> return (u:path)) `mplus` search' es | otherwise = search' es

Now, when we're going through the edge list in `search'`

, and we come across a matching edge, not only do we explore this path, but we also continue to explore the out-edges of the current node in the recursive call to `search'`

.

The IO monad is not an instance of `MonadPlus`

; we we're not able to execute the search with this monad. We can see that when using lists as the monad, we (a) get all possible paths in `gr`

and (b) get a path in `gr2`

.

**Example:**

MPlus> searchAll2 gr 0 3 :: [[Int]] [[0,1,3],[0,2,3]] MPlus> searchAll2 gr2 0 3 :: [[Int]] [[0,2,3]]

You might be tempted to implement this as:

searchAll2 g@(Graph vl el) src dst | src == dst = return [src] | otherwise = search' el where search' [] = fail "no path" search' ((u,v,_):es) | src == u = do path <- searchAll2 g v dst rest <- search' es return ((u:path) `mplus` rest) | otherwise = search' es

But note that this doesn't do what we want. Here, if the recursive call to `searchAll2`

fails, we don't try to continue and execute `search' es`

. The call to `mplus`

must be at the top level in order for it to work.

Exercises |
---|

Suppose that we changed the order of arguments to search' es `mplus` (searchAll2 g v dst >>= \path -> return (u:path)) How would you expect this to change the results when using the list monad on`gr` ? Why? |

## Monad Transformers[edit]

Often we want to "piggyback" monads on top of each other. For instance, there might be a case where you need access to both IO operations through the IO monad and state functions through some state monad. In order to accomplish this, we introduce a `MonadTrans`

class, which essentially "lifts" the operations of one monad into another. You can think of this as *stacking* monads on top of each other. This class has a simple method: `lift`

. The class declaration for `MonadTrans`

is:

class MonadTrans t where lift :: Monad m => m a -> t m a

The idea here is that `t`

is the outer monad and that `m`

lives inside of it. In order to execute a command of type `Monad m => m a`

, we first `lift`

it into the transformer.

The simplest example of a transformer (and arguably the most useful) is the state transformer monad, which is a state monad wrapped around an arbitrary monad. Before, we defined a state monad as:

newtype State state a = State (state -> (state, a))

Now, instead of using a function of type `state -> (state, a)`

as the monad, we assume there's some other monad `m`

and make the internal action into something of type `state -> m (state, a)`

. This gives rise to the following definition for a state transformer:

newtype StateT state m a = StateT (state -> m (state, a))

For instance, we can think of `m`

as IO. In this case, our state transformer monad is able to execute actions in the IO monad. First, we make this an instance of `MonadTrans`

:

instance MonadTrans (StateT state) where lift m = StateT (\s -> do a <- m return (s,a))

Here, lifting a function from the realm of `m`

to the realm of `StateT state`

simply involves keeping the state (the `s`

value) constant and executing the action.

Of course, we also need to make `StateT`

a monad, itself. This is relatively straightforward, provided that `m`

is already a monad:

instance Monad m => Monad (StateT state m) where return a = StateT (\s -> return (s,a)) StateT m >>= k = StateT (\s -> do (s', a) <- m s let StateT m' = k a m' s') fail s = StateT (\_ -> fail s)

The idea behind the definition of `return`

is that we keep the state constant and simply return the state/a pair in the enclosed monad. Note that the use of `return`

in the definition of `return`

refers to the enclosed monad, not the state transformer.

In the definition of bind, we create a new `StateT`

that takes a state `s`

as an argument. First, it applies this state to the first action (`StateT m`

) and gets the new state and answer as a result. It then runs the `k`

action on this new state and gets a new transformer. It finally applies the new state to this transformer. This definition is nearly identical to the definition of bind for the standard (non-transformer) `State`

monad described in the section on State.

The `fail`

function passes on the call to `fail`

in the enclosed monad, since state transformers don't natively know how to deal with failure.

Of course, in order to actually use this monad, we need to provide function `getT`

, `putT`

and `evalStateT`

. These are analogous to `getState`

, `putState`

and `runStateM`

from the section on State:

getT :: Monad m => StateT s m s getT = StateT (\s -> return (s, s)) putT :: Monad m => s -> StateT s m () putT s = StateT (\_ -> return (s, ())) evalStateT :: Monad m => StateT s m a -> s -> m a evalStateT (StateT m) state = do (s', a) <- m state return a

These functions should be straightforward. Note, however, that the result of `evalStateT`

is actually a monadic action in the enclosed monad. This is typical of monad transformers: they don't know how to actually run things in their enclosed monad (they only know how to `lift`

actions). Thus, what you get out is a monadic action in the inside monad (in our case, IO), which you then need to run yourself.

We can use state transformers to reimplement a version of our `mapTreeM`

function from the section on State. The only change here is that when we get to a leaf, we print out the value of the leaf; when we get to a branch, we just print out "Branch."

mapTreeM action (Leaf a) = do lift (putStrLn ("Leaf " ++ show a)) b <- action a return (Leaf b) mapTreeM action (Branch lhs rhs) = do lift (putStrLn "Branch") lhs' <- mapTreeM action lhs rhs' <- mapTreeM action rhs return (Branch lhs' rhs')

The only difference between this function and the one from the section on State is the calls to `lift (putStrLn ...)`

as the first line. The `lift`

tells us that we're going to be executing a command in an enclosed monad. In this case, the enclosed monad is `IO`

, since the command lifted is `putStrLn`

.

The type of this function is relatively complex:

mapTreeM :: (MonadTrans t, Monad (t IO), Show a) => (a -> t IO a1) -> Tree a -> t IO (Tree a1)

Ignoring, for a second, the class constraints, this says that `mapTreeM`

takes an action and a tree and returns a tree. This just as before. In this, we require that `t`

is a monad transformer (since we apply `lift`

in it); we require that `t IO`

is a monad, since we use `putStrLn`

we know that the enclosed monad is `IO`

; finally, we require that `a`

is an instance of show -- this is simply because we use `show`

to show the value of leaves.

Now, we simply change `numberTree`

to use this version of `mapTreeM`

, and the new versions of `get`

and `put`

, and we end up with:

numberTree tree = mapTreeM number tree where number v = do cur <- getT putT (cur+1) return (v,cur)

Using this, we can run our monad:

**Example:**

MTrans> evalStateT (numberTree testTree) 0 Branch Branch Leaf 'a' Branch Leaf 'b' Leaf 'c' Branch Leaf 'd' Leaf 'e' *MTrans> it Branch (Branch (Leaf ('a',0)) (Branch (Leaf ('b',1)) (Leaf ('c',2)))) (Branch (Leaf ('d',3)) (Leaf ('e',4)))

One problem not specified in our discussion of `MonadPlus`

is that our search algorithm will fail to terminate on graphs with cycles. Consider:

gr3 = Graph [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')] [(0,1,'l'), (1,0,'m'), (0,2,'n'), (1,3,'o'), (2,3,'p')]

In this graph, there is a back edge from node *b* back to node *a*. If we attempt to run `searchAll2`

, regardless of what monad we use, it will fail to terminate. Moreover, if we move this erroneous edge to the end of the list (and call this `gr4`

), the result of `searchAll2 gr4 0 3`

will contain an infinite number of paths: presumably we only want paths that don't contain cycles.

In order to get around this problem, we need to introduce state. Namely, we need to keep track of which nodes we have visited, so that we don't visit them again.

We can do this as follows:

searchAll5 g@(Graph vl el) src dst | src == dst = do visited <- getT putT (src:visited) return [src] | otherwise = do visited <- getT putT (src:visited) if src `elem` visited then mzero else search' el where search' [] = mzero search' ((u,v,_):es) | src == u = (do path <- searchAll5 g v dst return (u:path)) `mplus` search' es | otherwise = search' es

Here, we implicitly use a state transformer (see the calls to `getT`

and `putT`

) to keep track of visited states. We only continue to recurse, when we encounter a state we haven't yet visited. Furthermore, when we recurse, we add the current state to our set of visited states.

Now, we can run the state transformer and get out only the correct paths, even on the cyclic graphs:

**Example:**

MTrans> evalStateT (searchAll5 gr3 0 3) [] :: [[Int]] [[0,1,3],[0,2,3]] MTrans> evalStateT (searchAll5 gr4 0 3) [] :: [[Int]] [[0,1,3],[0,2,3]]

Here, the empty list provided as an argument to `evalStateT`

is the initial state (i.e., the initial visited list). In our case, it is empty.

We can also provide an `execStateT`

method that, instead of returning a result, returns the final state. This function looks like:

execStateT :: Monad m => StateT s m a -> s -> m s execStateT (StateT m) state = do (s', a) <- m state return s'

This is not so useful in our case, as it will return exactly the reverse of `evalStateT`

(try it and find out!), but can be useful in general (if, for instance, we need to know how many numbers are used in `numberTree`

).

Exercises |
---|

Write a function
Exploring 0 -> 3 Exploring 1 -> 3 Exploring 3 -> 3 Exploring 2 -> 3 Exploring 3 -> 3 MTrans> it [[0,1,3],[0,2,3]] In order to do this, you will have to define your own list monad transformer and make appropriate instances of it. |

Exercises |
---|

Combine the |

## Parsing Monads[edit]

It turns out that a certain class of parsers are all monads. This makes the construction of parsing libraries in Haskell very clean. In this chapter, we begin by building our own (small) parsing library in the section on A Simple Parsing Monad and then, in the final section, introduce the Parsec parsing library.

### A Simple Parsing Monad[edit]

Consider the task of parsing. A simple parsing monad is much like a state monad, where the state is the unparsed string. We can represent this exactly as:

newtype Parser a = Parser { runParser :: String -> Either String (String, a) }

We again use `Left err`

to be an error condition. This yields standard instances of `Monad`

and `MonadPlus`

:

instance Monad Parser where return a = Parser (\xl -> Right (xl,a)) fail s = Parser (\xl -> Left s) Parser m >>= k = Parser $ \xl -> case m xl of Left s -> Left s Right (xl', a) -> let Parser n = k a in n xl' instance MonadPlus Parser where mzero = Parser (\xl -> Left "mzero") Parser p `mplus` Parser q = Parser $ \xl -> case p xl of Right a -> Right a Left err -> case q xl of Right a -> Right a Left _ -> Left err

Now, we want to build up a library of parsing "primitives." The most basic primitive is a parser that will read a specific character. This function looks like:

char :: Char -> Parser Char char c = Parser char' where char' [] = Left ("expecting " ++ show c ++ " got EOF") char' (x:xs) | x == c = Right (xs, c) | otherwise = Left ("expecting " ++ show c ++ " got " ++ show x)

Here, the parser succeeds only if the first character of the input is the expected character.

We can use this parser to build up a parser for the string "Hello":

helloParser :: Parser String helloParser = do char 'H' char 'e' char 'l' char 'l' char 'o' return "Hello"

This shows how easy it is to combine these parsers. We don't need to worry about the underlying string -- the monad takes care of that for us. All we need to do is combine these parser primitives. We can test this parser by using `runParser`

and by supplying input:

**Example:**

Parsing> runParser helloParser "Hello" Right ("","Hello") Parsing> runParser helloParser "Hello World!" Right (" World!","Hello") Parsing> runParser helloParser "hello World!" Left "expecting 'H' got 'h'"

We can have a slightly more general function, which will match any character fitting a description:

matchChar :: (Char -> Bool) -> Parser Char matchChar c = Parser matchChar' where matchChar' [] = Left ("expecting char, got EOF") matchChar' (x:xs) | c x = Right (xs, x) | otherwise = Left ("expecting char, got " ++ show x)

Using this, we can write a case-insensitive "Hello" parser:

ciHelloParser = do c1 <- matchChar (`elem` "Hh") c2 <- matchChar (`elem` "Ee") c3 <- matchChar (`elem` "Ll") c4 <- matchChar (`elem` "Ll") c5 <- matchChar (`elem` "Oo") return [c1,c2,c3,c4,c5]

Of course, we could have used something like `matchChar ((=='h') . toLower)`

, but the above implementation works just as well. We can test this function:

**Example:**

Parsing> runParser ciHelloParser "hELlO world!" Right (" world!","hELlO")

Finally, we can have a function, which will match any character:

anyChar :: Parser Char anyChar = Parser anyChar' where anyChar' [] = Left ("expecting character, got EOF") anyChar' (x:xs) = Right (xs, x)

On top of these primitives, we usually build some combinators. The `many`

combinator, for instance, will take a parser that parses entities of type `a`

and will make it into a parser that parses entities of type `[a]`

(this is a Kleene-star operator):

many :: Parser a -> Parser [a] many (Parser p) = Parser many' where many' xl = case p xl of Left err -> Right (xl, []) Right (xl',a) -> let Right (xl'', rest) = many' xl' in Right (xl'', a:rest)

The idea here is that first we try to apply the given parser, `p`

. If this fails, we *succeed* but return the empty list. If `p`

succeeds, we recurse and keep trying to apply `p`

until it fails. We then return the list of successes we've accumulated.

In general, there would be many more functions of this sort, and they would be hidden away in a library, so that users couldn't actually look inside the `Parser`

type. However, using them, you could build up, for instance, a parser that parses (non-negative) integers:

int :: Parser Int int = do t1 <- matchChar isDigit tr <- many (matchChar isDigit) return (read (t1:tr))

In this function, we first match a digit (the `isDigit`

function comes from the module `Char`

/`Data.Char`

) and then match as many more digits as we can. We then `read`

the result and return it. We can test this parser as before:

**Example:**

Parsing> runParser int "54" Right ("",54) *Parsing> runParser int "54abc" Right ("abc",54) *Parsing> runParser int "a54abc" Left "expecting char, got 'a'"

Now, suppose we want to parse a Haskell-style list of `Int`

s. This becomes somewhat difficult because, at some point, we're either going to parse a comma or a close brace, but we don't know when this will happen. This is where the fact that `Parser`

is an instance of `MonadPlus`

comes in handy: first we try one, then we try the other.

Consider the following code:

intList :: Parser [Int] intList = do char '[' intList' `mplus` (char ']' >> return []) where intList' = do i <- int r <- (char ',' >> intList') `mplus` (char ']' >> return []) return (i:r)

The first thing this code does is parse and open brace. Then, using `mplus`

, it tries one of two things: parsing using `intList'`

, or parsing a close brace and returning an empty list.

The `intList'`

function assumes that we're not yet at the end of the list, and so it first parses an int. It then parses the rest of the list. However, it doesn't know whether we're at the end yet, so it again uses `mplus`

. On the one hand, it tries to parse a comma and then recurse; on the other, it parses a close brace and returns the empty list. Either way, it simply prepends the int it parsed itself to the beginning.

One thing that you should be careful of is the order in which you supply arguments to `mplus`

. Consider the following parser:

tricky = mplus (string "Hal") (string "Hall")

You might expect this parser to parse both the words "Hal" and "Hall;" however, it only parses the former. You can see this with:

**Example:**

Parsing> runParser tricky "Hal" Right ("","Hal") Parsing> runParser tricky "Hall" Right ("l","Hal")

This is because it tries to parse "Hal," which succeeds, and then it doesn't bother trying to parse "Hall."

You can attempt to fix this by providing a parser primitive, which detects end-of-file (really, end-of-string) as:

eof :: Parser () eof = Parser eof' where eof' [] = Right ([], ()) eof' xl = Left ("Expecting EOF, got " ++ show (take 10 xl))

You might then rewrite `tricky`

using `eof`

as:

tricky2 = do s <- mplus (string "Hal") (string "Hall") eof return s

But this also doesn't work, as we can easily see:

**Example:**

Parsing> runParser tricky2 "Hal" Right ("",()) Parsing> runParser tricky2 "Hall" Left "Expecting EOF, got \"l\""

This is because, again, the `mplus`

doesn't know that it needs to parse the whole input. So, when you provide it with "Hall," it parses just "Hal" and leaves the last "l" lying around to be parsed later. This causes `eof`

to produce an error message.

The correct way to implement this is:

tricky3 = mplus (do s <- string "Hal" eof return s) (do s <- string "Hall" eof return s)

We can see that this works:

**Example:**

Parsing> runParser tricky3 "Hal" Right ("","Hal") Parsing> runParser tricky3 "Hall" Right ("","Hall")

This works precisely because each side of the `mplus`

knows that it must read the end.

In this case, fixing the parser to accept both "Hal" and "Hall" was fairly simple, due to the fact that we assumed we would be reading an end-of-file immediately afterwards. Unfortunately, if we cannot disambiguate immediately, life becomes significantly more complicated. This is a general problem in parsing, and has little to do with monadic parsing. The solution most parser libraries (e.g., Parsec, see the section on Parsec) have adopted is to only recognize "LL(1)" grammars: that means that you must be able to disambiguate the input with a one token look-ahead.

Exercises |
---|

Write a parser |

Given this monadic parser, it is fairly easy to add information regarding source position. For instance, if we're parsing a large file, it might be helpful to report the line number on which an error occurred. We could do this simply by extending the `Parser`

type and by modifying the instances and the primitives:

newtype Parser a = Parser { runParser :: Int -> String -> Either String (Int, String, a) } instance Monad Parser where return a = Parser (\n xl -> Right (n,xl,a)) fail s = Parser (\n xl -> Left (show n ++ ": " ++ s)) Parser m >>= k = Parser $ \n xl -> case m n xl of Left s -> Left s Right (n', xl', a) -> let Parser m2 = k a in m2 n' xl' instance MonadPlus Parser where mzero = Parser (\n xl -> Left "mzero") Parser p `mplus` Parser q = Parser $ \n xl -> case p n xl of Right a -> Right a Left err -> case q n xl of Right a -> Right a Left _ -> Left err matchChar :: (Char -> Bool) -> Parser Char matchChar c = Parser matchChar' where matchChar' n [] = Left ("expecting char, got EOF") matchChar' n (x:xs) | c x = Right (n+if x=='\n' then 1 else 0 , xs, x) | otherwise = Left ("expecting char, got " ++ show x)

The definitions for `char`

and `anyChar`

are not given, since they can be written in terms of `matchChar`

. The `many`

function needs to be modified only to include the new state.

Now, when we run a parser and there is an error, it will tell us which line number contains the error:

**Example:**

Parsing2> runParser helloParser 1 "Hello" Right (1,"","Hello") Parsing2> runParser int 1 "a54" Left "1: expecting char, got 'a'" Parsing2> runParser intList 1 "[1,2,3,a]" Left "1: expecting ']' got '1'"

We can use the `intListSpace`

parser from the prior exercise to see that this does in fact work:

**Example:**

Parsing2> runParser intListSpace 1 "[1 ,2 , 4 \n\n ,a\n]" Left "3: expecting char, got 'a'" Parsing2> runParser intListSpace 1 "[1 ,2 , 4 \n\n\n ,a\n]" Left "4: expecting char, got 'a'" Parsing2> runParser intListSpace 1 "[1 ,\n2 , 4 \n\n\n ,a\n]" Left "5: expecting char, got 'a'"

We can see that the line number, on which the error occurs, increases as we add additional newlines before the erroneous "a".

### Parsec[edit]

As you continue developing your parser, you might want to add more and more features. Luckily, Graham Hutton and Daan Leijen have already done this for us in the Parsec library. This section is intended to be an introduction to the Parsec library; it by no means covers the whole library, but it should be enough to get you started.

Like our library, Parsec provides a few basic functions to build parsers from characters. These are: `char`

, which is the same as our `char`

; `anyChar`

, which is the same as our `anyChar`

; `satisfy`

, which is the same as our `matchChar`

; `oneOf`

, which takes a list of `Char`

s and matches any of them; and `noneOf`

, which is the opposite of `oneOf`

.

The primary function Parsec uses to run a parser is `parse`

. However, in addition to a parser, this function takes a string that represents the name of the file you're parsing. This is so it can give better error messages. We can try parsing with the above functions:

**Example:**

ParsecI> parse (char 'a') "stdin" "a" Right 'a' ParsecI> parse (char 'a') "stdin" "ab" Right 'a' ParsecI> parse (char 'a') "stdin" "b" Left "stdin" (line 1, column 1): unexpected "b" expecting "a" ParsecI> parse (char 'H' >> char 'a' >> char 'l') "stdin" "Hal" Right 'l' ParsecI> parse (char 'H' >> char 'a' >> char 'l') "stdin" "Hap" Left "stdin" (line 1, column 3): unexpected "p" expecting "l"

Here, we can see a few differences between our parser and Parsec: first, the rest of the string isn't returned when we run `parse`

. Second, the error messages produced are much better.

In addition to the basic character parsing functions, Parsec provides primitives for: `spaces`

, which is the same as ours; `space`

which parses a single space; `letter`

, which parses a letter; `digit`

, which parses a digit; `string`

, which is the same as ours; and a few others.

We can write our `int`

and `intList`

functions in Parsec as:

int :: CharParser st Int int = do i1 <- digit ir <- many digit return (read (i1:ir)) intList :: CharParser st [Int] intList = do char '[' intList' `mplus` (char ']' >> return []) where intList' = do i <- int r <- (char ',' >> intList') `mplus` (char ']' >> return []) return (i:r)

First, note the type signatures. The `st`

type variable is simply a state variable that we are not using. In the `int`

function, we use the `many`

function (built in to Parsec) together with the `digit`

function (also built in to Parsec). The `intList`

function is actually identical to the one we wrote before.

Note, however, that using `mplus`

explicitly is not the preferred method of combining parsers: Parsec provides a `<|>`

function that is a synonym of `mplus`

, but that looks nicer:

intList :: CharParser st [Int] intList = do char '[' intList' <|> (char ']' >> return []) where intList' = do i <- int r <- (char ',' >> intList') <|> (char ']' >> return []) return (i:r)

We can test this:

**Example:**

ParsecI> parse intList "stdin" "[3,5,2,10]" Right [3,5,2,10] ParsecI> parse intList "stdin" "[3,5,a,10]" Left "stdin" (line 1, column 6): unexpected "a" expecting digit

In addition to these basic combinators, Parsec provides a few other useful ones:

`choice`

takes a list of parsers and performs an*or*operation (`<|>`

) between all of them.

`option`

takes a default value of type`a`

and a parser that returns something of type`a`

. It then tries to parse with the parser, but it uses the default value as the return, if the parsing fails.

`optional`

takes a parser that returns`()`

and optionally runs it.

`between`

takes three parsers: an open parser, a close parser and a between parser. It runs them in order and returns the value of the between parser. This can be used, for instance, to take care of the brackets on our`intList`

parser.

`notFollowedBy`

takes a parser and returns one that succeeds only if the given parser would have failed.

Suppose we want to parse a simple calculator language that includes only plus and times. Furthermore, for simplicity, assume each embedded expression must be enclosed in parentheses. We can give a datatype for this language as:

data Expr = Value Int | Expr :+: Expr | Expr :*: Expr deriving (Eq, Ord, Show)

And then write a parser for this language as:

parseExpr :: Parser Expr parseExpr = choice [ do i <- int; return (Value i) , between (char '(') (char ')') $ do e1 <- parseExpr op <- oneOf "+*" e2 <- parseExpr case op of '+' -> return (e1 :+: e2) '*' -> return (e1 :*: e2) ]

Here, the parser alternates between two options (we could have used `<|>`

, but I wanted to show the `choice`

combinator in action). The first simply parses an int and then wraps it up in the `Value`

constructor. The second option uses `between`

to parse text between parentheses. What it parses is first an expression, then one of plus or times, then another expression. Depending on what the operator is, it returns either `e1 :+: e2`

or `e1 :*: e2`

.

We can modify this parser, so that instead of computing an `Expr`

, it simply computes the value:

parseValue :: Parser Int parseValue = choice [int ,between (char '(') (char ')') $ do e1 <- parseValue op <- oneOf "+*" e2 <- parseValue case op of '+' -> return (e1 + e2) '*' -> return (e1 * e2) ]

We can use this as:

**Example:**

ParsecI> parse parseValue "stdin" "(3*(4+3))" Right 21

Now, suppose we want to introduce bindings into our language. That is, we want to also be able to say "let x = 5 in" inside of our expressions and then use the variables we've defined. In order to do this, we need to use the `getState`

and `setState`

(or `updateState`

) functions built in to Parsec.

parseValueLet :: CharParser (FiniteMap Char Int) Int parseValueLet = choice [ int , do string "let " c <- letter char '=' e <- parseValueLet string " in " updateState (\fm -> addToFM fm c e) parseValueLet , do c <- letter fm <- getState case lookupFM fm c of Nothing -> unexpected ("variable " ++ show c ++ " unbound") Just i -> return i , between (char '(') (char ')') $ do e1 <- parseValueLet op <- oneOf "+*" e2 <- parseValueLet case op of '+' -> return (e1 + e2) '*' -> return (e1 * e2) ]

The `int`

and recursive cases remain the same. We add two more cases, one to deal with let-bindings, the other to deal with usages.

In the let-bindings case, we first parse a "let" string, followed by the character we're binding (the `letter`

function is a Parsec primitive that parses alphabetic characters), followed by its value (a `parseValueLet`

). Then, we parse the " in " and update the state to include this binding. Finally, we continue and parse the rest.

In the usage case, we simply parse the character and then look it up in the state. However, if it doesn't exist, we use the Parsec primitive `unexpected`

to report an error.

We can see this parser in action using the `runParser`

command, which enables us to provide an initial state:

**Example:**

ParsecI> runParser parseValueLet emptyFM "stdin" "let c=5 in ((5+4)*c)" Right 45 *ParsecI> runParser parseValueLet emptyFM "stdin" "let c=5 in ((5+4)*let x=2 in (c+x))" Right 63 *ParsecI> runParser parseValueLet emptyFM "stdin" "((let x=2 in 3+4)*x)" Right 14

Note that the bracketing does not affect the definitions of the variables. For instance, in the last example, the use of "x" is, in some sense, outside the scope of the definition. However, our parser doesn't notice this, since it operates in a strictly left-to-right fashion. In order to fix this omission, bindings would have to be removed (see the exercises).

Exercises |
---|

Modify the |