Haskell/More about lists
Earlier, we learned that Haskell builds lists via the cons operator
(:) and the empty list
. We saw how we can work on lists bit by bit using a combination of recursion and pattern matching. In this chapter and the next, we will consider more in-depth techniques for list processing and discover some new notation. We will get our first taste of Haskell features like infinite lists, list comprehensions, and higher-order functions.
Here's a function that doubles every element from a list of integers:
doubleList :: [Integer] -> [Integer] doubleList  =  doubleList (n:ns) = (2 * n) : doubleList ns
Here, the base case is the empty list which evaluates to an empty list. Otherwise, doubleList builds up a new list by using (:). The first element of this new list is twice the head of the argument, and we obtain the rest of the result by recursively calling doubleList on the tail of the argument. When the tail gets to an empty list, the base case will be invoked and recursion will stop.
Let's study the evaluation of an example expression:
We can work it out longhand by substituting the argument into the function definition, just like schoolbook algebra:
doubleList 1:[2,3,4] = (1*2) : doubleList (2 : [3,4]) = (1*2) : (2*2) : doubleList (3 : ) = (1*2) : (2*2) : (3*2) : doubleList (4 : ) = (1*2) : (2*2) : (3*2) : (4*2) : doubleList  = (1*2) : (2*2) : (3*2) : (4*2) :  = 2 : 4 : 6 : 8 :  = [2, 4, 6, 8]
Thus, we rebuilt the original list replacing every element by its double.
In this longhand evaluation exercise, the moment at which we choose to evaluate the multiplications does not affect the result. We could just as well have evaluated the doublings immediately after each recursive call of doubleList.
Haskell uses this flexibility on evaluation order in some important ways. As a pure functional programming language, the compiler makes most of the decisions about when to actually evaluate things. As a lazy language, Haskell usually defers evaluation until a final value is needed (which may sometimes never occur). From the programmer's point of view, evaluation order rarely matters.
To add triple a list, we could follow the same strategy as with code>doubleList:
tripleList :: [Integer] -> [Integer] tripleList  =  tripleList (n:ns) = (3 * n) : tripleList ns
But we don't want to write a new list-multiplying function for every different multiplier (such as multiplying the elements of a list by 4, 8, 17 etc.). So, let's make a general function to allow multiplication by any number. Our new function will take two arguments: the multiplicand as well as a list of
Integers to multiply:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList _  =  multiplyList m (n:ns) = (m * n) : multiplyList m ns
This example deploys
_ as a "don't care" pattern. The multiplicand is not used for the base case, so we ignore that argument instead of giving it a name (like
We can test
multiplyList to see that it works as expected:
Prelude> multiplyList 17 [1,2,3,4] [17,34,51,68]
Write the following functions and test them out. Don't forget the type signatures.
Generalizing even further
In this chapter, we started with a function constrained to multiplying the elements by
2. Then, we recognized that we could avoid hard-coding a new function for each multiplicand by making
multiplyList to easily use any
Integer. Now, what if we wanted a different operator such as addition or to compute the square of each element?
We can generalize still further using a key functionality of Haskell. However, because the solution can seem surprising, we will approach it in a somewhat roundabout way. Consider the type signature of
multiplyList :: Integer -> [Integer] -> [Integer]
The first thing to know is that the
-> arrow in type signatures is right associative. That means we can read this signature as:
multiplyList :: Integer -> ([Integer] -> [Integer])
How should we understand that? It tells us that
multiplyList is a function that takes one
Integer argument and evaluates to another function. The function it returns happens to take a list of
Integers and return another list of
Consider our old
doubleList function redefined in terms of
doubleList :: [Integer] -> [Integer] doubleList xs = multiplyList 2 xs
Writing this way, we can clearly cancel out the `xs`:
doubleList = multiplyList 2
This definition style (with no argument variables) is called "point-free" style. Our definition now says that applying only one argument to
multiplyList doesn't fail to evaluate, rather it gives us a more specific function of type
[Integer] -> [Integer] instead of finishing with a final
We now see that functions in Haskell behave much like any other value. Functions can return other functions, and functions can stand alone as objects without mentioning their arguments. Functions seem almost like normal constants. Can we use functions themselves as arguments even? Yes, and that's the key to the next step in generalizing
multiplyList. We need a function that takes any other appropriate function and applies the given function to the elements of a list:
applyToIntegers :: (Integer -> Integer) -> [Integer] -> [Integer] applyToIntegers _  =  applyToIntegers f (n:ns) = (f n) : applyToIntegers f ns
applyToIntegers, we can take any
Integer -> Integer function and apply it to the elements of a list of
Integers. We can, of course, use this generalized function to redefine
multiplyList. Specifically, we just use the
(*) function as the first argument for applyToIntegers:
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList x = applyToIntegers ((*) x)
If all this abstraction confuses you, consider a concrete example: When we multiply 5 * 7 in Haskell, the
(*) function doesn't just take two arguments at once, it actually first takes the 5 and returns a new 5* function; and that new function then takes a second argument and multiplies that by 5. So, for our example, we then give the 7 as an argument to the 5* function, and that returns us our final evaluated number (35).
So, all functions in Haskell really take only one argument. However, when we know how many intermediate functions we will generate to reach a final result, we can treat functions as if they take many arguments. The number of arguments we generally talk about functions taking is actually the number of one-argument functions we get between the first argument and a final, non-functional result value.
The process of creating intermediate functions when feeding arguments into a complex function is called currying (named after Haskell Curry, also the namesake of the Haskell programming language).
applyToIntegers has type
(Integer -> Integer) -> [Integer] -> [Integer], the definition itself contains nothing specific to integers. To use the same logic with other types of lists, we could define versions such as
applyToStrings and so on. They would all have the same definition but different type signatures. We can avoid all that redundancy with the final step in generalizing: making a fully polymorphic version with signature
(a -> b) -> [a] -> [b]. Prelude already has this function, and it is called
map :: (a -> b) -> [a] -> [b] map _  =  map f (x:xs) = (f x) : map f xs
map, we can effortlessly implement functions as different as...
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList x = map ((*) x)
heads :: [[a]] -> [a] heads = map head
Prelude> heads [[1,2,3,4],[4,3,2,1],[5,10,15]] [1,4,5]
map is the general solution for applying a function to each and every element of a list. Our original
doubleList problem was simply a specific version of
map. Functions like
map which take other functions as arguments are called higher-order functions. We will learn about more higher-order functions for list processing in the next chapter.
Tips and Tricks
A few miscellaneous notes about lists in Haskell:
Dot Dot Notation
Haskell has a convenient shorthand for writing ordered lists of regularly-spaced integers. Some examples to illustrate it:
Code Result ---- ------ [1..10] [1,2,3,4,5,6,7,8,9,10] [2,4..10] [2,4,6,8,10] [5,4..1] [5,4,3,2,1] [1,3..10] [1,3,5,7,9]
The same notation works with characters and even with floating point numbers. Unfortunately, floating-point numbers are problematic due to rounding errors. Try this:
[0,0.1 .. 1]
Thanks to lazy evaluation, Haskell lists can be infinite. For example, the following generates the infinite list of integers starting with 1:
(If you try this in GHCi, remember you can stop an evaluation with Ctrl-c).
The same effect could be achieved with a recursive function:
intsFrom n = n : intsFrom (n + 1) -- note there is no base case! positiveInts = intsFrom 1
Infinite lists are useful in practice because Haskell's lazy evaluation never actually evaluates more than it needs at any given moment. In most cases, we c treat an infinite list like an ordinary one. The program will only go into an infinite loop when evaluation requires all the values in the list. So, we can't sort or print an infinite list, but:
evens = doubleList [1..]
will define "evens" to be the infinite list [2,4,6,8..], and we can then pass "evens" into other functions that only need to evaluate part of the list for their final result. Haskell will know to only use the portion of the infinite list needed in the end.
Compared to hard-coding a long finite list, it's often more convenient to define an infinite list and then take the first few items. An infinite list can also be a handy alternative to the traditional endless loop at the top level of an interactive program.
A note about
Given the choice of using either the
( : ) pattern or
tail to split lists, pattern matching is almost always preferable. It may be tempting to use
tail due to simplicity and terseness, but it is too easy to forget that they fail on empty lists (and runtime crashes are never good). We do have a Prelude function,
null :: [a] -> Bool, which returns
True for empty lists and
False otherwise, so that provides a sane way of checking for empty lists without pattern matching; but matching an empty list tends to be cleaner and clearer than the corresponding if-then-else expression using
- Had we forgotten the base case, once the recursion got to an empty list the (x:xs) pattern match would fail, and we would get an error.
- …as long as none of the calculations result in an error or nontermination, which are not problems in this case.
- The compiler may soemtimes evaluate things sooner in order to improve efficiency.
- One exception is the case of infinite lists (!) which we will consider in a short while.