# Haskell/Arrow tutorial

Arrows provide an alternative to the usual way of structuring computations with the basic functor classes.
This chapter provides a hands-on tutorial about them, while the next one,
Understanding arrows, complements it with a conceptual overview.
We recommend you to start with the tutorial, so that you get to taste what programming with arrows feels like.
You can of course switch back and forth between the tutorial and the first part of *Understanding arrows*
if you prefer going at a slower pace. Be sure to follow along every step of the tutorial on GHC(i).

## Stephen's Arrow Tutorial

[edit | edit source]In this tutorial, I will create my own arrow, show how to use the arrow `proc`

notation,
and show how `ArrowChoice`

works.
We will end up with a simple game of Hangman.

First, we give a language pragma (omitted) to enable the arrow do notation in the compiler. And then, some imports:

```
module Main where
import Control.Arrow
import Control.Monad
import qualified Control.Category as Cat
import Data.List
import Data.Maybe
import System.Random
```

Any Haskell function can behave as an arrow, because there is an Arrow instance for the
function type constructor `(->)`

. In this tutorial I will build a more interesting arrow
than this, with the ability to maintain state (something that a plain Haskell
function arrow cannot do). Arrows can produce all sorts of effects, including I/O,
but we'll just explore some simple examples.

We'll call our new arrow `Circuit`

to suggest that we can
visualize arrows as circuits.^{[1]}

## Type definition for `Circuit`

[edit | edit source]A plain Haskell function treated as an arrow has type `a -> b`

.
Our `Circuit`

arrow has two distinguishing features: First, we wrap it in a `newtype`

declaration to cleanly define an Arrow instance. Second, in order for the circuit to maintain its own internal state,
our arrow returns a replacement for itself along with the normal `b`

output value.

```
newtype Circuit a b = Circuit { unCircuit :: a -> (Circuit a b, b) }
```

To make this an arrow, we need to make it an instance of both `Category`

and `Arrow`

.
Throughout these definitions, we always replace each `Circuit`

with the new version
of itself that it has returned.

```
instance Cat.Category Circuit where
id = Circuit $ \a -> (Cat.id, a)
(.) = dot
where
(Circuit cir2) `dot` (Circuit cir1) = Circuit $ \a ->
let (cir1', b) = cir1 a
(cir2', c) = cir2 b
in (cir2' `dot` cir1', c)
```

The Cat.id function replaces itself with a copy of itself without maintaining any state. The
purpose of the `(.)`

function is to chain two arrows together from right to left. `(>>>)`

and `(<<<)`

are
based on `(.)`

. It needs to replace itself with the `dot` of the two replacements returned by
the execution of the argument Circuits.

```
instance Arrow Circuit where
arr f = Circuit $ \a -> (arr f, f a)
first (Circuit cir) = Circuit $ \(b, d) ->
let (cir', c) = cir b
in (first cir', (c, d))
```

`arr`

lifts a plain Haskell function as an arrow. Like with `id`

, the replacement it
gives is just itself, since a plain Haskell function can't maintain state.

Now we need a function to run a circuit:

```
runCircuit :: Circuit a b -> [a] -> [b]
runCircuit _ [] = []
runCircuit cir (x:xs) =
let (cir',x') = unCircuit cir x
in x' : runCircuit cir' xs
```

For `mapAccumL`

fans like me, this can alternatively be written as

```
runCircuit :: Circuit a b -> [a] -> [b]
runCircuit cir inputs =
snd $ mapAccumL (\cir x -> unCircuit cir x) cir inputs
```

or, after eta-reduction, simply as:

```
runCircuit :: Circuit a b -> [a] -> [b]
runCircuit cir = snd . mapAccumL unCircuit cir
```

`Circuit`

primitives

[edit | edit source]Let's define a generalized accumulator to be the basis for our later work. `accum'`

is a less general version of `accum`

.

```
-- | Accumulator that outputs a value determined by the supplied function.
accum :: acc -> (a -> acc -> (b, acc)) -> Circuit a b
accum acc f = Circuit $ \input ->
let (output, acc') = input `f` acc
in (accum acc' f, output)
-- | Accumulator that outputs the accumulator value.
accum' :: b -> (a -> b -> b) -> Circuit a b
accum' acc f = accum acc (\a b -> let b' = a `f` b in (b', b'))
```

Here is a useful concrete accumulator which keeps a running total of all the numbers passed to it as inputs.

```
total :: Num a => Circuit a a
total = accum' 0 (+)
```

We can run this circuit, like this:

```
*Main> runCircuit total [1,0,1,0,0,2]
[1,1,2,2,2,4]
*Main>
```

## Arrow `proc`

notation

[edit | edit source]Here is a statistical mean function:

```
mean1 :: Fractional a => Circuit a a
mean1 = (total &&& (const 1 ^>> total)) >>> arr (uncurry (/))
```

It maintains two accumulator cells, one for the sum,
and one for the number of elements. It splits the input using the "fanout" operator
`&&&`

and before the input of the second stream, it discards the input value and
replaces it with 1.

`const 1 ^>> total`

is shorthand for `arr (const 1) >>> total`

.
The first stream is the sum of the inputs. The second stream is the sum of 1 for
each input (i.e. a count of the number of inputs). Then, it merges the two streams with the `(/)`

operator.

Here is the same function, but written using arrow `proc`

notation:

```
mean2 :: Fractional a => Circuit a a
mean2 = proc value -> do
t <- total -< value
n <- total -< 1
returnA -< t / n
```

The `proc`

notation describes the same relationship between the arrows, but in a totally different way.
Instead of explicitly describing the wiring, you glue the arrows together using variable bindings
and pure Haskell expressions, and the compiler works out all the `arr, (>>>), (&&&)`

stuff
for you. Arrow `proc`

notation also contains a pure 'let' statement exactly like the monadic `do`

one.

`proc`

is the keyword that introduces arrow notation, and it binds the arrow input to
a pattern (`value`

in this example). Arrow statements in a `do`

block take one of these forms:

*variable binding pattern*<-*arrow*-<*pure expression giving arrow input**arrow*-<*pure expression giving arrow input*

Like with monads, the `do`

keyword is needed only to combine multiple lines using the variable binding patterns
with `<-`

. As with monads, the last line isn't allowed to have a variable binding pattern, and the
output value of the last line is the output value of the arrow. `returnA`

is an arrow just like
'total' is (in fact, `returnA`

is just the identity arrow, defined as `arr id`

).

Also like with monads, lines other than the last line may have no variable binding, and you get
the effect only, discarding the return value. In `Circuit`

, there would never be a point
in doing this (since no state can escape except through the return value), but in many arrows
there would be.

As you can see, for this example the `proc`

notation makes the code much more readable. Let's try them:

```
*Main> runCircuit mean1 [0,10,7,8]
[0.0,5.0,5.666666666666667,6.25]
*Main> runCircuit mean2 [0,10,7,8]
[0.0,5.0,5.666666666666667,6.25]
*Main>
```

## Hangman: Pick a word

[edit | edit source]Now for our Hangman game. Let's pick a word from a dictionary:

```
generator :: Random a => (a, a) -> StdGen -> Circuit () a
generator range rng = accum rng $ \() rng -> randomR range rng
dictionary = ["dog", "cat", "bird"]
pickWord :: StdGen -> Circuit () String
pickWord rng = proc () -> do
idx <- generator (0, length dictionary-1) rng -< ()
returnA -< dictionary !! idx
```

With `generator`

, we're using the accumulator functionality to hold our random number generator.
`pickWord`

doesn't introduce anything new, except that the generator arrow is constructed by
a Haskell function that takes arguments.
Here is the output:

```
*Main> rng <- getStdGen
*Main> runCircuit (pickWord rng) [(), (), ()]
["dog","bird","dog"]
*Main>
```

We will use these little arrows in a minute. The first returns `True`

the first time, then `False`

forever afterwards:

```
oneShot :: Circuit () Bool
oneShot = accum True $ \_ acc -> (acc, False)
```

```
*Main> runCircuit oneShot [(), (), (), (), ()]
[True,False,False,False,False]
```

The second stores a value and returns it, when it gets a new one:

```
delayedEcho :: a -> Circuit a a
delayedEcho acc = accum acc (\a b -> (b,a))
```

which can be shortened to:

```
delayedEcho :: a -> Circuit a a
delayedEcho acc = accum acc (flip (,))
```

```
*Main> runCircuit (delayedEcho False) [True, False, False, False, True]
[False,True,False,False,False]
```

The game's main arrow will be executed repeatedly, and we would like to pick the
word only once on the first iteration, and have it remember it for the rest of
the game. Rather than just mask its output on subsequent loops, we'd prefer to
actually run `pickWord`

only once (since in a real implementation it could be very slow).
However, as it stands, the data flow in a Circuit **must** go down
**all** the paths of component arrows. In order to allow the data flow to go down
one path and not another, we need to make our arrow an instance of `ArrowChoice`

.
Here's the minimal definition:

```
instance ArrowChoice Circuit where
left orig@(Circuit cir) = Circuit $ \ebd -> case ebd of
Left b -> let (cir', c) = cir b
in (left cir', Left c)
Right d -> (left orig, Right d)
getWord :: StdGen -> Circuit () String
getWord rng = proc () -> do
-- If this is the first game loop, run pickWord. mPicked becomes Just <word>.
-- On subsequent loops, mPicked is Nothing.
firstTime <- oneShot -< ()
mPicked <- if firstTime
then do
picked <- pickWord rng -< ()
returnA -< Just picked
else returnA -< Nothing
-- An accumulator that retains the last 'Just' value.
mWord <- accum' Nothing mplus -< mPicked
returnA -< fromJust mWord
```

Because `ArrowChoice`

is defined, the compiler now allows us to put an `if`

after `<-`

, and thus
choose which arrow to execute (either run `pickWord`

, or skip it). Note that this is not a normal Haskell
`if`

: The compiler implements this using `ArrowChoice`

.
The compiler also implements `case`

here in the same way.

It is important to understand that none of the local name bindings, including the `proc`

argument, is
in scope between `<-`

and `-<`

except in the condition of an `if`

or `case`

.
For example, this is illegal:

{- proc rng -> do idx <- generator (0, length dictionary-1) rng -< () -- ILLEGAL returnA -< dictionary !! idx -}

The arrow to execute, here `generator (0, length dictionary -1) rng`

, is evaluated
in the scope that exists outside the 'proc' statement. `rng`

does not exist in this scope.
If you think about it, this makes sense, because the arrow is constructed at the beginning
only (outside `proc`

). If it were constructed for each execution of the arrow, how would it
keep its state?

Let's try `getWord`

:

```
*Main> rng <- getStdGen
*Main> runCircuit (getWord rng) [(), (), (), (), (), ()]
["dog","dog","dog","dog","dog","dog"]
*Main>
```

## Hangman: Main program

[edit | edit source]Now here is the game:

```
attempts :: Int
attempts = 5
livesLeft :: Int -> String
livesLeft hung = "Lives: ["
++ replicate (attempts - hung) '#'
++ replicate hung ' '
++ "]"
hangman :: StdGen -> Circuit String (Bool, [String])
hangman rng = proc userInput -> do
word <- getWord rng -< ()
let letter = listToMaybe userInput
guessed <- updateGuess -< (word, letter)
hung <- updateHung -< (word, letter)
end <- delayedEcho True -< not (word == guessed || hung >= attempts)
let result = if word == guessed
then [guessed, "You won!"]
else if hung >= attempts
then [guessed, livesLeft hung, "You died!"]
else [guessed, livesLeft hung]
returnA -< (end, result)
where
updateGuess :: Circuit (String, Maybe Char) String
updateGuess = accum' (repeat '_') $ \(word, letter) guess ->
case letter of
Just l -> map (\(w, g) -> if w == l then w else g) (zip word guess)
Nothing -> take (length word) guess
updateHung :: Circuit (String, Maybe Char) Int
updateHung = proc (word, letter) -> do
total -< case letter of
Just l -> if l `elem` word then 0 else 1
Nothing -> 0
main :: IO ()
main = do
rng <- getStdGen
interact $ unlines -- Concatenate lines out output
. ("Welcome to Arrow Hangman":) -- Prepend a greeting to the output
. concat . map snd . takeWhile fst -- Take the [String]s as long as the first element of the tuples is True
. runCircuit (hangman rng) -- Process the input lazily
. ("":) -- Act as if the user pressed ENTER once at the start
. lines -- Split input into lines
```

And here's an example session. For best results, compile the game and run it from a terminal rather than from GHCi:

Welcome to Arrow Hangman ___ Lives: [#####] a ___ Lives: [#### ] g __g Lives: [#### ] d d_g Lives: [#### ] o dog You won!

## Advanced stuff

[edit | edit source]In this section I will complete the coverage of arrow notation.

### Combining arrow commands with a function

[edit | edit source]We implemented `mean2`

like this:

```
mean2 :: Fractional a => Circuit a a
mean2 = proc value -> do
t <- total -< value
n <- total -< 1
returnA -< t / n
```

GHC defines a banana bracket syntax for combining arrow statements with a function that operates
on arrows. (In Ross Paterson's paper
^{[2]}
a `form`

keyword is used, but GHC adopted the banana bracket
instead.) Although there's no real reason to, we can write `mean`

like this:

```
mean3 :: Fractional a => Circuit a a
mean3 = proc value -> do
(t, n) <- (| (&&&) (total -< value) (total -< 1) |)
returnA -< t / n
```

The first item inside the `(| ... |)`

is a function that takes any number of arrows as input and
returns an arrow. Infix notation cannot be used here. It is followed by the arguments, which are in the
form of proc statements. These statements may contain `do`

and bindings with `<-`

if you like. Each argument
is translated into an arrow and given as an argument to the function `(&&&)`

.

You may ask, what is the point of this? We can combine arrows quite happily without the `proc`

notation.
Well, the point is that you get the convenience of using local variable bindings in the statements.

The banana brackets are in fact not required. The compiler is intelligent enough to assume that this is
what you mean when you write it like this (note that infix notation *is* allowed here):

```
mean4 :: Fractional a => Circuit a a
mean4 = proc value -> do
(t, n) <- (total -< value) &&& (total -< 1)
returnA -< t / n
```

So why do we need the banana brackets? For situations where this plainer syntax is
ambiguous. The reason is that the arrow part of a `proc`

command is *not an ordinary Haskell expression*.
Recall that for arrows specified in proc statements, the following things hold true:

- Local variable bindings are only allowed in the input expression after
`-<`

, and for the`if`

and`case`

condition. The arrow itself is interpreted in the scope that exists outside`proc`

. `if`

and`case`

statements are not plain Haskell. They are implemented using`ArrowChoice`

.- Functions used to combine arrows are not normal Haskell either. They are shorthand for banana bracket notation.

### Recursive bindings

[edit | edit source]At the risk of wearing out the `mean`

example, here is yet another way to implement it using recursive
bindings. In order for this to work, we'll need an arrow that delays its input by one step:

```
delay :: a -> Circuit a a
delay last = Circuit $ \this -> (delay this, last)
```

Here is what delay does:

```
*Main> runCircuit (delay 0) [5,6,7]
[0,5,6]
*Main>
```

Here is our recursive version of `mean`

:

```
mean5 :: Fractional a => Circuit a a
mean5 = proc value -> do
rec
(lastTot, lastN) <- delay (0,0) -< (tot, n)
let (tot, n) = (lastTot + value, lastN + 1)
let mean = tot / n
returnA -< mean
```

The `rec`

block resembles a `do`

' block, except that

- The last line can be, and usually is, a variable binding. It doesn't matter whether it's a
`let`

or a`do`

-block binding with`<-`

. - The
`rec`

block doesn't have a return value.`var <- rec ...`

is illegal, and`rec`

is not allowed to be the last element in a`do`

block. - The use of variables is expected to form a cycle (otherwise there is no point in
`rec`

).

The machinery of `rec`

is handled by the `loop`

function of the `ArrowLoop`

class,
which we define for Circuit like this:

```
instance ArrowLoop Circuit where
loop (Circuit cir) = Circuit $ \b ->
let (cir', (c,d)) = cir (b,d)
in (loop cir', c)
```

Behind the scenes, the way it works is this:

- Any variables defined in
`rec`

that are forward referenced in`rec`

are looped around by passing them through the second tuple element of`loop`

. Effectively the variable bindings and references to them can be in any order (but the order of arrow statements is significant in terms of effects). - Any variables defined in
`rec`

that are referenced from outside`rec`

are returned in the first tuple element of`loop`

.

It is important to understand that `loop`

(and therefore `rec`

) simply binds variables. It doesn't hold onto values
and pass them back in the next invocation - `delay`

does this part. The cycle formed by the variable
references must be broken by some sort of delay arrow or lazy evaluation, otherwise
the code would die in an infinite loop as if you had written `let a = a+1`

in
plain Haskell.

### ArrowApply

[edit | edit source]As mentioned before, the arrow part of an arrow statement (before `-<`

) can't contain any variables
bound inside 'proc'. There is an alternative operator, `-<<`

which removes this restriction. It
requires the arrow to implement the `ArrowApply`

typeclass.

## Notes

- ↑ This interpretation of arrows-as-circuits is loosely based on the Yampa functional reactive programming library.
- ↑ Ross Paterson's Paper specifying arrow
`proc`

notation