Haskell/More about lists
By now we have already met the basic tools for working with lists. We can build lists up from the cons operator (:) and the empty list [], and we can take them apart by using a combination of recursion and pattern matching. In this chapter and the next we will consider in more depth techniques for list processing and, while doing so, discover a bit of new notation and have a first taste of characteristic Haskell features such as infinite lists, list comprehensions and higher-order functions.
Note
Throughout this chapter you will read and write functions which sum, subtract and multiply elements of lists. For simplicity's sake we will pretend the list elements have to be of type Integer. However, by recalling the discussions on the Type basics II chapter, you will realize that assumption is not necessary at all! Therefore, we suggest that, as an exercise of sorts, you figure out what would the type signatures of such functions look like if we made them polymorphic, allowing for the list elements to have any type in the class Num. To check your signatures, just omit them temporarily, load the functions into GHCi, use :t and let type inference guide you.
Rebuilding lists [edit]
We'll start by writing and analysing a function to double every element of a list of integers. For our current purposes, it will be enough that its type is so that it takes a list of Integers and evaluates to another list of Integers:
doubleList :: [Integer] -> [Integer]
Then we must write the function definition. As usual in such cases, we will go for a recursive definition:
doubleList [] = [] doubleList (n:ns) = (2 * n) : doubleList ns
Here, the base case is for an empty list; and it just evaluates to an empty list. As for the general case, doubleList builds up a new list by using (:). The first element of this new list is twice the head of the argument, and the rest of the result is obtained by a recursive call – the application of doubleList to the tail. If the tail happens to be an empty list, the base case will be invoked and recursion stops[1].
By understanding the recursive definition we can picture what actually happens when we evaluate an expression such as
doubleList [1,2,3,4]
We can work it out longhand by substituting the argument into the function definition, just like schoolbook algebra:
doubleList 1:[2,3,4] = (1*2) : doubleList (2 : [3,4]) = (1*2) : (2*2) : doubleList (3 : [4]) = (1*2) : (2*2) : (3*2) : doubleList (4 : []) = (1*2) : (2*2) : (3*2) : (4*2) : doubleList [] = (1*2) : (2*2) : (3*2) : (4*2) : [] = 2 : 4 : 6 : 8 : [] = [2, 4, 6, 8]
The bottom line is that, in effect, we rebuilt the original list replacing every element by its double.
One important thing to notice is that in this longhand evaluation exercise the moment at which we choose to evaluate the multiplications does not affect the result[2]. Had we done them immediately after each recursive call of doubleList it would have made no difference. This reflects an important property of Haskell: it is a pure functional programming language. Because evaluation order can never change the result, it is mostly left to the compiler to decide when to actually evaluate things. Since Haskell is also a lazy evaluation language, evaluation is usually deferred until the value is really needed, but the compiler is free to evaluate things sooner if this will improve efficiency. From the programmer's point of view evaluation order rarely matters [3].
Generalizing [edit]
Suppose that we needed, while solving a given problem, not only a function to double a list but also one that tripled it. In principle, we could follow the same strategy and define:
tripleList :: [Integer] -> [Integer] tripleList [] = [] tripleList (n:ns) = (3 * n) : tripleList ns
Both doubleList and tripleList have very limited applicability. Every time we needed multiplying the elements of a list by 4, 8, 17 etc. we would need to write a new list-multiplying function, and all of them would do nearly the same thing. An obvious improvement would be generalizing our function to allow multiplication by any number. Doing so requires a function that takes an Integer multiplicand as well as a list of Integers. Here is a way to define it:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList _ [] = [] multiplyList m (n:ns) = (m*n) : multiplyList m ns
This example deploys _ as a "don't care" pattern. The multiplicand is not used for the base case, so instead of giving it a name (like m, n or ns) it is explicitly ignored.
multiplyList solves our current problem for any integer number:
Prelude> multiplyList 17 [1,2,3,4] [17,34,51,68]
In particular, it is trivial to rewrite our earlier functions in terms of multiplyList:
doubleList :: [Integer] -> [Integer] doubleList xs = multiplyList 2 xs tripleList :: [Integer] -> [Integer] tripleList xs = multiplyList 3 xs
| Exercises |
|---|
|
Write the following functions and test them out. Don't forget the type signatures.
take, drop, and sum. |
Generalizing even further [edit]
We just wrote a function which can multiply the elements of a list by any Integer. Not bad, but can we do any better?
Let us begin by rewriting multiplyList in a rather artificial manner:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList _ [] = [] multiplyList m (n:ns) = (multiplyByM n) : multiplyList m ns where multiplyByM x = m * x
The point of this rewrite is making it clear that while multiplyList is much more useful than doubleList it still resembles its predecessor in one aspect. Early on, the problem was being constrained to multiplying the elements by 2 or some other hard-coded number; now, we could justifiably complain about being constrained to apply multiplyByM to the list elements. What if we wanted to do sumWithM instead; or, for that matter, something entirely different (say, computing the square of each element)? We would be back to square one, having to rewrite the recursive function.
A key functionality of Haskell will save the day. Since the solution can be quite surprising, we will approach it in a somewhat roundabout way. Consider the type signature of multiplyList:
multiplyList :: Integer -> [Integer] -> [Integer]
The first novelty is that the -> arrow in type signatures is right associative. That means we can read this signature as:
multiplyList :: Integer -> ( [Integer] -> [Integer] )
How should we understand that? It tells us multiplyList is a function that takes one Integer argument and evaluates to another function, which in turn takes a list of Integers and returns another list of Integers. In practice, that means we can rewrite the following definition...
evens = multiplyList 2 [1,2,3,4] -- [2,4,6,8]
...like this:
evens = (multiplyList 2) [1,2,3,4]
The expression within parentheses, (multiplyList 2), is just an [Integer] -> [Integer] function which is then applied to [1,2,3,4]. Now, (multiplyList 2) is of course the same as doubleList; and to drive the point home we can redefine that as:
doubleList :: [Integer] -> [Integer] doubleList = multiplyList 2
The expression on the right side is perfectly well-formed and stands on its own, so writing the second argument of multiplyList, and thus the argument of doubleList, is strictly unnecessary[4].
The trickery above illustrates that functions in Haskell behave much like any other value - we can have them returned from other functions, and even define them without mentioning their arguments, as if they were a normal constant. What if we could have functions as arguments as well? That way, we could, based on the recursive skeleton of multiplyList, write a function that took, instead of the argument m, a function to replace multiplyByM. A function like this one:
applyToIntegers :: (Integer -> Integer) -> [Integer] -> [Integer] applyToIntegers _ [] = [] applyToIntegers f (n:ns) = (f n) : applyToIntegers f ns
This function does indeed work, and so we can apply any Integer -> Integer function to the elements of a list of Integers. Now, since the multiplication operator (*) is just a function with two arguments, the same tricks apply to it; in fact, had we defined the following function it would do exactly what its name suggests:
multiplyByM m = ((*) m)
And, finally, we can rewrite multiplyList in terms of applyToIntegers:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList m = applyToIntegers ((*) m)
Or, equivalently:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList m list = applyToIntegers ((*) m) list
The map function [edit]
While applyToIntegers has type (Integer -> Integer) -> [Integer] -> [Integer], there is nothing specific to integers in its algorithm. Therefore, we could define versions such as applyToChars, applyToStrings and applyToLists just by changing the type signature. That would be horribly wasteful, though: we did not climb all the way up to this point just to need a different function for each type! Furthermore, nothing prevents us from changing the signature to, for instance, (Integer -> String) -> [Integer] -> [String]; thus giving a function that takes a function Integer -> String and returns a function [Integer] -> [String] which applies the function originally passed as argument to each element of an Integer list.
The final step of generalization, then, is to make a fully polymorphic version of applyToIntegers, with signature (a -> b) -> [a] -> [b]. Such a function already exists in Prelude: it is called map and defined as:
map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = (f x) : map f xs
Using it, we can effortlessly implement functions as different as...
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList m = map ((*) m)
... and...
heads :: [[a]] -> [a] heads = map head
Prelude> heads [[1,2,3,4],[4,3,2,1],[5,10,15]] [1,4,5]
map solves the problem of applying a function to each element of a list[5] in a completely general way by allowing us to choose which function to use. Functions like map which take other functions as arguments are called higher-order functions, and they are extremely useful. In particular, we will meet some other important higher-order functions used for list processing in the next chapter.
| Exercises |
|---|
|
Tips and Tricks [edit]
Before we carry on with more list processing, let us make a few miscellaneous useful observations about lists in Haskell.
Dot Dot Notation [edit]
Haskell has a convenient shorthand for writing ordered lists of regularly-spaced integers. Some examples to illustrate it:
Code Result ---- ------ [1..10] [1,2,3,4,5,6,7,8,9,10] [2,4..10] [2,4,6,8,10] [5,4..1] [5,4,3,2,1] [1,3..10] [1,3,5,7,9]
The same notation can be used with characters as well, and even with floating point numbers - though the latter is not necessarily a good idea due to rounding errors. Try this:
[0,0.1 .. 1]
Additionally, the notation only works with sequences with fixed differences between consecutive elements. For instance, you should not write...
[0,1,1,2,3,5,8..100]
... and expect to magically get back the rest of the Fibonacci series[6].
Infinite Lists [edit]
One of the most mind-bending things about Haskell lists is that they are allowed to be infinite. For example, the following generates the infinite list of integers starting with 1:
[1..]
(If you try this in GHCi, remember you can stop an evaluation with Ctrl-c).
The same effect could be achieved with a recursive function:
intsFrom n = n : intsFrom (n+1) -- note there is no base case! positiveInts = intsFrom 1
This works because Haskell uses lazy evaluation: it never actually evaluates more than it needs at any given moment. In most cases an infinite list can be treated just like an ordinary one. The program will only go into an infinite loop when evaluation would actually require all the values in the list. Examples of this include sorting or printing the entire list. However:
evens = doubleList [1..]
will define "evens" to be the infinite list [2,4,6,8..]. And you can pass "evens" into other functions, and unless they actually need to evaluate the end of the list it will all just work.
Infinite lists are quite useful in Haskell. Often it's more convenient to define an infinite list and then take the first few items than to create a finite list. Functions that process two lists in parallel generally stop with the shortest, so making the second one infinite avoids having to find the length of the first. An infinite list is often a handy alternative to the traditional endless loop at the top level of an interactive program.
A note about head and tail [edit]
Given the choice of using either the ( : ) pattern or head/tail to split lists, there is little reason to prefer one in favour of the other in principle - which style is more convenient changes from case to case. While pattern matching can be a more immediate solution in some cases - say, for implementing recursive functions, head/tail can be more versatile in scenarios involving function composition.
Another thing to account for when using head and tail is that they will fail on empty lists - and runtime crashes are never a good thing. One sane way to check for empty lists if need be is with the Prelude function null :: [a] -> Bool, which returns True for empty lists and False otherwise.
| Exercises |
|---|
|
Notes [edit]
- ↑ Note that, had we forgotten the base case, once the recursion got to an empty list the (x:xs) pattern match would fail, and we would get an error.
- ↑ As long as none of them results in an error or nontermination. But let us sidestep that issue for now.
- ↑ One exception is the case of infinite lists (!) which we will consider in a short while
- ↑ While this style of definition may look exotic, it is actually commonplace in Haskell code. It is called point-free style.
- ↑ Our original
doubleListproblem was a very specific instance of that. - ↑ http://en.wikipedia.org/wiki/Fibonacci_number