# Haskell/Other data structures

## Trees[edit]

Now let's look at one of the most important data structures: trees. A tree is an example of a recursive datatype. While there are several different kinds of trees, for this discussion we will adopt the following definition, which captures the essential features of a tree that will concern us here:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

As you can see, it's parametrised, so we can have trees of `Int`

s, trees of `String`

s, trees of `Maybe Int`

s, even trees of `(Int, String)`

pairs, if you really want. What makes it special is that `Tree`

appears in the definition of itself. We will see how this works by using an already known example: the list.

### Lists as Trees[edit]

As we have seen in More about lists and List Processing, we break lists down into two cases: An empty list (denoted by `[]`), and an element of the specified type, with another list (denoted by `(x:xs)`). This gives us valuable insight about the definition of lists:

data [a] = [] | (a:[a]) -- Pseudo-Haskell, will not work properly.

Which is sometimes written as (for Lisp-inclined people):

data List a = Nil | Cons a (List a)

As you can see this is also recursive, like the tree we had. Here, the constructor functions are `[]`

and `(:)`

. They represent what we have called `Leaf`

and `Branch`

. We can use these in pattern matching, just as we did with the empty list and the `(x:xs)`

:

### Maps and Folds[edit]

We already know about maps and folds for lists. With our realisation that a list is some sort of tree, we can try to write map and fold functions for our own type `Tree`

. To recap:

data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving (Show) data [a] = [] | (:) a [a] -- (:) a [a] is the same as (a:[a]) with prefix instead of infix notation.

We consider map first, then folds.

*Note*

Deriving is explained later on in the section Class Declarations. For now, understand it as telling Haskell (and by extension your interpreter) how to display a Tree instance.

#### Map[edit]

Let's take a look at the definition of `map`

for lists:

map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = f x : map f xs

First, if we were to write `treeMap`

, what would its type be? Defining the function is easier if you have an idea of what its type should be.

We want it to work on a `Tree`

of some type, and it should return another `Tree`

of some type. What `treeMap`

does is applying a function on each element of the tree, so we also need a function. In short:

treeMap :: (a -> b) -> Tree a -> Tree b

See how this is similar to the list example?

Next, we should start with the easiest case. When talking about a `Tree`

, this is obviously the case of a `Leaf`

. A `Leaf`

only contains a single value, so all we have to do is apply the function to that value and then return a `Leaf`

with the altered value:

treeMap :: (a -> b) -> Tree a -> Tree b treeMap f (Leaf x) = Leaf (f x)

Also, this looks a lot like the empty list case with `map`

. Now if we have a `Branch`

, it will include two subtrees; what do we do with them? When looking at the list-`map`

, you can see it uses a call to itself on the tail of the list. We also shall do that with the two subtrees. The complete definition of `treeMap`

is as follows:

treeMap :: (a -> b) -> Tree a -> Tree b treeMap f (Leaf x) = Leaf (f x) treeMap f (Branch left right) = Branch (treeMap f left) (treeMap f right)

We can make this a bit more readable by noting that `treeMap f`

is itself a function with type `Tree a -> Tree b`

. This gives us the following revised definition:

treeMap :: (a -> b) -> Tree a -> Tree b treeMap f = g where g (Leaf x) = Leaf (f x) g (Branch left right) = Branch (g left) (g right)

If you don't understand it just now, re-read it. Especially the use of pattern matching may seem weird at first, but it is essential to the use of datatypes. The most important thing to remember is that pattern matching happens on constructor functions.

If you understand it, read on for folds.

#### Fold[edit]

Now we've had the `treeMap`

, let's try to write a `treeFold`

. Again, let's take a look at the definition of `foldr`

for lists, as it is easier to understand.

foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)

Recall that lists have two constructors:

(:) :: a -> [a] -> [a] -- two arguments [] :: [a] -- zero arguments

Thus `foldr`

takes two arguments corresponding to the two constructors:

f :: a -> b -> b -- a two-argument function z :: b -- like a zero-argument function

We'll use the same strategy to find a definition for `treeFold`

as we did for `treeMap`

. First, the type. We want `treeFold`

to transform a tree of some type into a value of some other type; so in place of `[a] -> b`

we will have `Tree a -> b`

. How do we specify the transformation? First note that `Tree a`

has two constructors:

Branch :: Tree a -> Tree a -> Tree a Leaf :: a -> Tree a

So `treeFold`

will have two arguments corresponding to the two constructors:

fbranch :: b -> b -> b fleaf :: a -> b

Putting it all together we get the following type definition:

treeFold :: (b -> b -> b) -> (a -> b) -> Tree a -> b

That is, the first argument, of type `(b -> b -> b)`

, is a function specifying how to combine subtrees; the second argument, of type `a -> b`

, is a function specifying what to do with leaves; and the third argument, of type `Tree a`

, is the tree we want to fold.

As with `treeMap`

, we'll avoid repeating the arguments `fbranch`

and `fleaf`

by introducing a local function `g`

:

treeFold :: (b -> b -> b) -> (a -> b) -> Tree a -> b treeFold fbranch fleaf = g where -- definition of g goes here

The argument `fleaf`

tells us what to do with `Leaf`

subtrees:

g (Leaf x) = fleaf x

The argument `fbranch`

tells us how to combine the results of "folding" two subtrees:

g (Branch left right) = fbranch (g left) (g right)

Our full definition becomes:

treeFold :: (b -> b -> b) -> (a -> b) -> Tree a -> b treeFold fbranch fleaf = g where g (Leaf x) = fleaf x g (Branch left right) = fbranch (g left) (g right)

For examples of how these work, copy the `Tree`

data definition and the `treeMap`

and `treeFold`

functions to a Haskell file, along with the following:

tree1 :: Tree Integer tree1 = Branch (Branch (Branch (Leaf 1) (Branch (Leaf 2) (Leaf 3))) (Branch (Leaf 4) (Branch (Leaf 5) (Leaf 6)))) (Branch (Branch (Leaf 7) (Leaf 8)) (Leaf 9)) doubleTree = treeMap (*2) -- doubles each value in tree sumTree = treeFold (+) id -- sum of the leaf values in tree fringeTree = treeFold (++) (: []) -- list of the leaves of tree

Then load it into your favourite Haskell interpreter, and evaluate:

doubleTree tree1 sumTree tree1 fringeTree tree1

## Other datatypes[edit]

Map and fold functions can be defined for any kind of data type. In order to generalize the strategy applied for lists and trees, in this final section we will work out a map and a fold for a very strange, intentionally contrived, datatype:

data Weird a b = First a | Second b | Third [(a,b)] | Fourth (Weird a b)

It can be a useful exercise to write the functions as you follow the examples, trying to keep the coding one step ahead of your reading.

### General Map[edit]

Again, we will begin with `weirdMap`

. The first important difference in working with this `Weird`

type is that it has *two* type parameters. For that reason, we will want the map function to take two functions as arguments, one to be applied on the elements of type `a` and another for the elements of type `b`. With that accounted for, we can write the type signature of `weirdMap`

:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d

Next step is writing the definitions for `weirdMap`

. The key point is that maps preserve the *structure* of a datatype, so the function must evaluate to a `Weird`

which uses the same constructor as the one used for the original `Weird`

. For that reason, we need one definition to handle each constructor, and these constructors are used as patterns for writing them. As before, to avoid repeating the `weirdMap`

argument list over and over again a **where** clause comes in handy. A sketch of the function is below:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d weirdMap fa fb = g where g (First x) = --More to follow g (Second y) = --More to follow g (Third z) = --More to follow g (Fourth w) = --More to follow

The first two cases are fairly straightforward, as there is just a single element of `a`

or `b`

type inside the `Weird`

.

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d weirdMap fa fb = g where g (First x) = First (fa x) g (Second y) = Second (fb y) g (Third z) = --More to follow g (Fourth w) = --More to follow

`Third`

is trickier because it contains another data structure (a list) whose elements are themselves data structures (the tuples). So we need to navigate the nested data structures, apply `fa`

and `fb`

on all elements of type `a`

and `b`

inside it and eventually (as a map must preserve structure) produce a list of tuples – `[(c,d)]`

– to be used with the constructor. The simplest approach might seem to be just breaking down the list inside the `Weird`

and playing with the patterns:

g (Third []) = Third [] g (Third ((x,y):zs)) = Third ( (fa x, fb y) : ( (\(Third z) -> z) (g (Third zs)) ) )

This appears to be written as a typical recursive function for lists. We start by applying the functions of interest to the first element in order to obtain the head of the new list, `(fa x, fb y)`

. But what we will cons it to? As `g`

requires a `Weird`

argument we need to make a `Weird`

using the list tail in order to make the recursive call. But then `g`

will give a `Weird`

and not a list, so we have to retrieve the modified list from that – that's the role of the lambda function. And finally, there is also the empty list base case to be defined as well.

After all of that, we are left with a messy function. Every recursive call of `g`

requires wrapping `zs`

into a `Weird`

, while what we really wanted to do was to build a list with `(fa x, fb y)`

and the modified `xs`

. The problem with this solution is that `g`

can (thanks to pattern matching) act directly on the list head but (due to its type signature) can't be called directly on the list tail. For that reason, it would be better to apply `fa`

and `fb`

without breaking down the list with pattern matching (as far as `g`

is directly concerned, at least). But there *was* a way to directly modify a list element-by-element...

g (Third z) = Third ( map (\(x, y) -> (fa x, fb y) ) z)

...our good old `map`

function, which modifies all tuples in the list `z`

using a lambda function. In fact, the first attempt at writing the definition looked just like an application of the list map except for the spurious `Weird`

packing and unpacking. We got rid of these by having the pattern splitting of `z`

done by `map`

, which works directly with regular lists. You could find it useful to expand the map definition inside `g`

for seeing a clearer picture of that difference. Finally, you may prefer to write this new version in an alternative, very clean way using list comprehension syntax:

g (Third z) = Third [ (fa x, fb y) | (x,y) <- z ]

Adding the `Third`

function, we only have the `Fourth`

one left to do:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d weirdMap fa fb = g where g (First x) = First (fa x) g (Second y) = Second (fb y) g (Third z) = Third ( map (\(x, y) -> (fa x, fb y) ) z) g (Fourth w) = --More to follow

Dealing with the recursive `Fourth`

constructor is actually really easy. Just apply `g`

recursively!

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d weirdMap fa fb = g where g (First x) = First (fa x) g (Second y) = Second (fb y) g (Third z) = Third ( map (\(x, y) -> (fa x, fb y) ) z) g (Fourth w) = Fourth (g w)

### General Fold[edit]

While we were able to define a map by specifying as arguments a function for every separate type, this isn't enough for a fold. For a fold, we'll need a function for every constructor function. This is also the case with lists! Remember the constructors of a list are `[]`

and `(:)`

. The `z`

argument in the `foldr`

function corresponds to the `[]`

constructor. The `f`

argument in the `foldr`

function corresponds to the `(:)`

constructor. The `Weird`

datatype has four constructors, so we need four functions – one for handling the internal structure of the datatype specified by each constructor. Next, we have an argument of the `Weird a b`

type, and finally we want the whole fold function to evaluate to a value of some other, arbitrary, type. Additionally, each individual function we pass to `weirdFold`

must evaluate to the same type `weirdFold`

does. That allows us to make a mock type signature and sketch the definition:

weirdFold :: (something1 -> c) -> (something2 -> c) -> (something3 -> c) -> (something4 -> c) -> Weird a b -> c weirdFold f1 f2 f3 f4 = g where g (First x) = --Something of type c here g (Second y) = --Something of type c here g (Third z) = --Something of type c here g (Fourth w) = --Something of type c here

Now we need to figure out to which types `something1`

, `something2`

, `something3`

and `something4`

correspond to. That is done by analysing the constructors, since the functions must take as arguments the elements of the datatype (whose types are specified by the constructor type signature). Again, the types and definitions of the first two functions are easy to find. The third one isn't difficult either, as for the purposes of folding the list of `(a,b)`

tuples is no different from a simple type – unlike in the map example, its *internal* structure does not concern us now. The fourth constructor, however, is recursive, and we have to watch out. As in the case of `weirdMap`

, we also need to recursively call the `g`

function. This brings us to the following, final, definition:

weirdFold :: (a -> c) -> (b -> c) -> ([(a,b)] -> c) -> (c -> c) -> Weird a b -> c weirdFold f1 f2 f3 f4 = g where g (First x) = f1 x g (Second y) = f2 y g (Third z) = f3 z g (Fourth w) = f4 (g w)

*Note*

If you were expecting very complex expressions in the `weirdFold`

above and are surprised by the immediacy of the solution, it might be helpful to have a look on what the common `foldr`

would look like if we wrote it in this style and didn't have the special square-bracket syntax of lists to distract us:

-- List a is [a], Cons is (:) and Nil is [] data List a = Cons a (List a) | Nil listFoldr :: (a -> b -> b) -> (b) -> List a -> b listFoldr fCons fNil = g where g (Cons x xs) = fCons x (g xs) g Nil = fNil

Now it is easier to see the parallels. The extra complications are that `Cons`

(that is, `(:)`

) takes two arguments (and, for that reason, so does `fCons`

) and is recursive, requiring a call to `g`

. Also, `fNil`

is of course not really a function, as it takes no arguments.

#### Folds on recursive datatypes[edit]

As far as folds are concerned `Weird`

was a fairly nice datatype to deal with. Just one recursive constructor, which isn't even nested inside other structures. What would happen if we added a truly complicated fifth constructor?

Fifth [Weird a b] a (Weird a a, Maybe (Weird a b))

A valid, and tricky, question. In general, the following rules apply:

- A function to be supplied to a fold has the same number of arguments as the corresponding constructor.
- The type of the arguments of such a function match the types of the constructor arguments,
*except*if the constructor is recursive (that is, takes an argument of its own type). - If a constructor is recursive, any recursive argument of the constructor will correspond to an argument of the type the fold evaluates to.
^{[1]} - If a constructor is recursive, the complete fold function should be (recursively) applied to the recursive constructor arguments.
- If a recursive element appears inside another data structure, the appropriate map function for that data structure should be used to apply the fold function to it.

So `f5`

would have the type:

f5 :: [c] -> a -> (Weird a a, Maybe c) -> c

as the type of `Fifth`

is:

Fifth :: [Weird a b] -> a -> (Weird a a, Maybe (Weird a b)) -> Weird a b

The definition of `g`

for the `Fifth`

constructor will be:

g (Fifth list x (waa, mc)) = f5 (map g list) x (waa, maybeMap g mc) where maybeMap f Nothing = Nothing maybeMap f (Just w) = Just (f w)

Now note that nothing strange happens with the `Weird a a`

part. No `g`

gets called. What's up? This is a recursion, right? Well... not really. `Weird a a`

and `Weird a b`

are different types, so it isn't a real recursion. It isn't guaranteed that, for example, `f2`

will work with something of type 'a', where it expects a type 'b'. It can be true for some cases, but not for everything.

Also look at the definition of `maybeMap`

. Verify that it is indeed a map function as:

- It preserves structure.
- Only types are changed.

## Notes[edit]

- ↑ This sort of recursiveness, in which the function used for folding can take the result of another fold as an argument, is what confers the folds of data structures such as lists and trees their "accumulating" functionality.