Haskell/do Notation

From Wikibooks, open books for an open world
< Haskell
Jump to: navigation, search

Among the initial examples of monads, there were some which used an alternative syntax with do blocks for chaining computations. Those examples, however, were not the first time we have seen do: back in Simple input and output we had seen how code for doing input-output was written in an identical way. That is no coincidence: what we have been calling IO actions are just computations in a monad - namely, the IO monad. We will revisit IO soon; for now, though, let us consider exactly how the do notation translates into regular monadic code. Since the following examples all involve IO, we will refer to the computations/monadic values as actions, like in the earlier parts of the book. Still do works with any monad; there is nothing specific about IO in how it works.

Translating the then operator [edit]

The (>>) (then) operator is easy to translate between do notation and plain code, so we will see it first. For example, suppose we have a chain of monads like the following one:

putStr "Hello" >> 
putStr " " >> 
putStr "world!" >> 
putStr "\n"

We can rewrite it in do notation as follows:

do putStr "Hello"
   putStr " "
   putStr "world!"
   putStr "\n"

This sequence of instructions is very similar to what you would see in any imperative language such as C. The actions being chained could be anything, as long as all of them are in the same monad. In the context of the IO, for instance, an action might be writing to a file, opening a network connection or asking the user for input. The general way we translate these actions from the do notation to standard Haskell code is:

do action
   other_action
   yet_another_action

which becomes

action >>
do other_action
   yet_another_action

and so on until the do block is empty.

Translating the bind operator [edit]

The (>>=) is a bit more difficult to translate from and to do notation, essentially because it involves passing a value downstream in the binding sequence. These values can be stored using the <- notation, and used downstream in the do block.

do result         <- action
   another_result <- another_action
   (action_based_on_previous_results result another_result)

This is translated back into monadic code substituting:

action >>= f
where f result = do another_result <- another_action
                    (action_based_on_previous_results result another_result)
      f _      = fail "..."

In words, the action brought outside of the do block is bound to a function, which is defined to take an argument (to make it easy to identify it, we named result just like in the complete do block). If the pattern matching is unsuccessful, the monad's implementation of fail will be called.

Notice that the variables left of the <- in the do block have been extracted from the monad, so if action produces e.g. a IO String, the type of result will be String.

Example: user-interactive program [edit]

Note: we are going to interact with the user, so we will use putStr and getLine alternately. To avoid unexpected results in the output remember to disable output buffering importing System.IO and putting hSetBuffering stdout NoBuffering at the top of your code. Otherwise you can explictly flush the output buffer before each interaction with the user (namely a getLine) using hFlush stdout. If you are testing this code with ghci you don't have such problems.

Consider this simple program that asks the user for his or her first and last names:

nameDo :: IO ()
nameDo = do putStr "What is your first name? "
            first <- getLine
            putStr "And your last name? "
            last <- getLine
            let full = first++" "++last
            putStrLn ("Pleased to meet you, "++full++"!")

The code in do notation is quite readable, and it is easy to see where it is going to. The <- notation makes it possible to store first and last names as if they were pure variables, though they never can be in reality: function getLine is not pure because it can give a different result every time it is run.

If we were to translate the code into standard monadic code, the result would be:

name :: IO ()
name = putStr "What is your first name? " >>
       getLine >>= f
       where
       f first = putStr "And your last name? " >>
                 getLine >>= g
                 where
                 g last = putStrLn ("Pleased to meet you, "++full++"!")
                          where
                          full = first++" "++last

The advantage of the do notation should now be apparent: the code in nameDo is much more readable, and does not run off the right edge of the screen.

The indentation increase is mainly caused by where clauses related to (>>=) operators, and by the fact that we cannot simply extract a value from the IO monad but must define new functions instead, and take advantage of pattern matching. This explains why the do notation is so popular when dealing with the IO monad, which is often used to obtain values (user input, reading files, etc.) that cannot, by construction, be taken out of the monad.

To avoid the indentation increase and mimic the do-notation closely, you could also use lambdas (anonymous functions) like so (compare this version with the original do-version):

nameLambda :: IO ()
nameLambda = putStr "What is your first name? " >>
             getLine >>=
             \first -> putStr "And your last name? " >>
             getLine >>=
             \last -> let full = first++" "++last
                          in  putStrLn ("Pleased to meet you, "++full++"!")

Returning values [edit]

The last statement in a do notation is the result of the do block. In the previous example, the result was of the type IO (), that is an empty value in the IO monad.

Suppose that we want to rewrite the example, but returning a IO String with the acquired name. All we need to do is add a return instruction:

nameReturn :: IO String
nameReturn = do putStr "What is your first name? "
                first <- getLine
                putStr "And your last name? "
                last <- getLine
                let full = first++" "++last
                putStrLn ("Pleased to meet you, "++full++"!")
                return full

This example will "return" the full name as a string inside the IO monad, which can then be utilized downstream. This kind of code is probably the reason it is so easy to misunderstand the nature of return: it does not only share a name with C's keyword, it seems to have the same function here.

However, check this code now:

nameReturn' = do putStr "What is your first name? "
                 first <- getLine
                 putStr "And your last name? "
                 last <- getLine
                 let full = first++" "++last
                 putStrLn ("Pleased to meet you, "++full++"!")
                 return full
                 putStrLn "I am not finished yet!"

The last string will be printed out, meaning that a return is not a final statement interrupting the flow, as it is in C and other languages. Indeed, the type of nameReturn' is IO (), meaning that the IO String created by the return full instruction has been completely removed: the result of the do block is now the result of the final putStrLn action, which is exactly IO ().