Haskell/do Notation

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

The do notation is a different way of writing monadic code; it is especially useful with the IO monad, since that monad does not allow to extract pure values from it by design; in contrast, you can extract pure values from Maybe or lists using pattern matching or appropriate functions.

[edit] Translating the then operator

The (>>) (then) operator is easy to translate between do notation and plain code, so we will see it first. For example, suppose we have a chain of monads like the following one:

putStr "Hello" >> 
putStr " " >> 
putStr "world!" >> 
putStr "\n"

We can rewrite it in do notation as follows:

do putStr "Hello"
   putStr " "
   putStr "world!"
   putStr "\n"

This sequence of instructions is very similar to what you would see in any imperative language such as C.

Since the do notation is used especially with input-output, monads are often called actions in this context; an action could be writing to a file, opening a network connection or asking the user for input. The general way we translate these actions from the do notation to standard Haskell code is:

do action
   other_action
   yet_another_action

which becomes

 action >>
 do other_action
    yet_another_action

and so on until the do block is empty.

[edit] Translating the bind operator

The (>>=) is a bit more difficult to translate from and to do notation, essentially because it involves passing a value downstream in the binding sequence. These values can be stored using the <- notation, and used downstream in the do block.

do result         <- action
   another_result <- another_action
   (action_based_on_previous_results result another_result)

This is translated back into monadic code substituting:

action >>= f
where f result = do another_result <- another_action
                    (action_based_on_previous_results result another_result)
      f _      = fail "..."

In words, the action brought outside of the do block is bound to a function, which is defined to take an argument (to make it easy to identify it, we named result just like in the complete do block). If the pattern matching is unsuccessful, the monad's implementation of fail will be called.

Notice that the variables left of the <- in the do block have been extracted from the monad, so if action produces e.g. a IO String, the type of result will be String.

[edit] Example: user-interactive program

Consider this simple program that asks the user for his or her first and last names:

nameDo :: IO ()
nameDo = do putStr "What is your first name? "
            first <- getLine
            putStr "And your last name? "
            last <- getLine
            let full = first++" "++last
            putStrLn ("Pleased to meet you, "++full++"!")

The code in do notation is quite readable, and it is easy to see where it is going to. The <- notation makes it possible to store first and last names as if they were pure variables, though they never can be in reality: function getLine is not pure because it can give a different result every time it is run (in fact, it would be of very little help if it did not).

If we were to translate the code into standard monadic code, the result would be:

name :: IO ()
name = putStr "What is your first name? " >>
       getLine >>= f
       where
       f first = putStr "And your last name? " >>
                 getLine >>= g
                 where
                 g last = putStrLn ("Pleased to meet you, "++full++"!")
                          where
                          full = first++" "++last

The advantage of the do notation should now be apparent: the code in nameDo is much more readable, and does not run off the right edge of the screen.

The indentation increase is mainly caused by where clauses related to (>>=) operators, and by the fact that we cannot simply extract a value from the IO monad but must define new functions instead, and take advantage of pattern matching. This explains why the do notation is so popular when dealing with the IO monad, which is often used to obtain values (user input, reading files, etc.) that cannot, by construction, be taken out of the monad.

[edit] Returning values

The last statement in a do notation is the result of the do block. In the previous example, the result was of the type IO (), that is an empty value in the IO monad.

Suppose that we want to rewrite the example, but returning a IO String with the acquired name. All we need to do is adding a return instruction:

nameReturn :: IO String
nameReturn = do putStr "What is your first name? "
                first <- getLine
                putStr "And your last name? "
                last <- getLine
                let full = first++" "++last
                putStrLn ("Pleased to meet you, "++full++"!")
                return full

This example will "return" the full name as a string inside the IO monad, which can then be utilised downstream. This kind of code is probably the reason it is so easy to misunderstand the nature of return: it does not only share a name with C's keyword, it seems to have the same function here.

However, check this code now:

nameReturn' = do putStr "What is your first name? "
                 first <- getLine
                 putStr "And your last name? "
                 last <- getLine
                 let full = first++" "++last
                 putStrLn ("Pleased to meet you, "++full++"!")
                 return full
                 putStrLn "I am not finished yet!"

The last string will be printed out, meaning that a return is not a final statement interrupting the flow, as it is in C and other languages. Indeed, the type of nameReturn' is IO (), meaning that the IO String created by the return full instruction has been completely removed: the result of the do block is now the result of the final putStrLn action, which is exactly IO ().