Haskell/Lists II

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Earlier, we learned that Haskell builds lists via the cons operator (:) and the empty list []. We saw how we can work on lists bit by bit using a combination of recursion and pattern matching. In this chapter and the next, we will consider more in-depth techniques for list processing and discover some new notation. We will get our first taste of Haskell features like infinite lists, list comprehensions, and higher-order functions.

Note

Throughout this chapter, you will read and write functions which sum, subtract, and multiply elements of lists. For simplicity's sake, we will pretend that list elements are of type Integer. However, as you will recall from the discussions on Type basics II, there are many different types within the Num typeclass. As an exercise of sorts, you could figure out what the type signatures of such functions would be if we made them polymorphic, allowing for the list elements to have any type in the class Num. To check your signatures, just omit them temporarily, load the functions into GHCi, use :t and let type inference guide you.


Rebuilding lists[edit | edit source]

Here's a function that doubles every element from a list of integers:

doubleList :: [Integer] -> [Integer]
doubleList [] = []
doubleList (n:ns) = (2 * n) : doubleList ns

Here, the base case is the empty list which evaluates to an empty list. In the recursive case, doubleList builds up a new list by using (:). The first element of this new list is twice the head of the argument, and we obtain the rest of the result by recursively calling doubleList on the tail of the argument. When the tail gets to an empty list, the base case will be invoked and recursion will stop.[1]

Let's study the evaluation of an example expression:

doubleList [1,2,3,4]

We can work it out longhand by substituting the argument into the function definition, just like schoolbook algebra:

doubleList 1:[2,3,4] = (1*2) : doubleList (2 : [3,4])
                     = (1*2) : (2*2) : doubleList (3 : [4])
                     = (1*2) : (2*2) : (3*2) : doubleList (4 : [])
                     = (1*2) : (2*2) : (3*2) : (4*2) : doubleList []
                     = (1*2) : (2*2) : (3*2) : (4*2) : []
                     = 2 : 4 : 6 : 8 : []
                     = [2, 4, 6, 8]

Thus, we rebuilt the original list replacing every element by its double.

In this longhand evaluation exercise, the moment at which we choose to evaluate the multiplications does not affect the result. We could just as well have evaluated the doublings immediately after each recursive call of doubleList.[2]

Haskell uses this flexibility on evaluation order in some important ways. As a pure functional programming language, the compiler makes most of the decisions about when to actually evaluate things. As a lazy language, Haskell usually defers evaluation until a final value is needed (which may sometimes never occur).[3] From the programmer's point of view, evaluation order rarely matters.[4]

Generalizing[edit | edit source]

To triple each element in a list, we could follow the same strategy as with doubleList:

tripleList :: [Integer] -> [Integer]
tripleList [] = []
tripleList (n:ns) = (3 * n) : tripleList ns

But we don't want to write a new list-multiplying function for every different multiplier (such as multiplying the elements of a list by 4, 8, 17 etc.). So, let's make a general function to allow multiplication by any number. Our new function will take two arguments: the multiplicand as well as a list of Integers to multiply:

multiplyList :: Integer -> [Integer] -> [Integer]
multiplyList _ [] = []
multiplyList m (n:ns) = (m * n) : multiplyList m ns

This example deploys _ as a "don't care" pattern. The multiplicand is not used for the base case, so we ignore that argument instead of giving it a name (like m, n, or ns).

We can test multiplyList to see that it works as expected:

Prelude> multiplyList 17 [1,2,3,4]
[17,34,51,68]
Exercises

Write the following functions and test them out. Don't forget the type signatures.

  1. takeInt returns the first n items in a list. So, takeInt 4 [11,21,31,41,51,61] returns [11,21,31,41].
  2. dropInt drops the first n items in a list and returns the rest. So, dropInt 3 [11,21,31,41,51] returns [41,51].
  3. sumInt returns the sum of the items in a list.
  4. scanSum adds the items in a list and returns a list of the running totals. So, scanSum [2,3,4,5] returns [2,5,9,14].
  5. diffs returns a list of the differences between adjacent items. So, diffs [3,5,6,8] returns [2,1,2]. (Hints: one solution involves writing an auxiliary function which takes two lists and calculates the difference between corresponding elements. Alternatively, you might explore the fact that lists with at least two elements can be matched to a (x:y:ys) pattern.)
The first three functions are in Prelude under the names take, drop, and sum.

Generalizing even further[edit | edit source]

In this chapter, we started with a function constrained to multiplying the elements by 2. Then, we recognized that we could avoid hard-coding a new function for each multiplicand by making multiplyList to easily use any Integer. Now, what if we wanted a different operator such as addition or to compute the square of each element?

We can generalize still further using a key functionality of Haskell. However, because the solution can seem surprising, we will approach it in a somewhat roundabout way. Consider the type signature of multiplyList:

multiplyList :: Integer -> [Integer] -> [Integer]

The first thing to know is that the -> arrow in type signatures is right associative. That means we can read this signature as:

multiplyList :: Integer -> ([Integer] -> [Integer])

How should we understand that? It tells us that multiplyList is a function that takes one Integer argument and evaluates to another function. The function thus returned, then, takes a list of Integers and returns another list of Integers.

Consider our old doubleList function redefined in terms of multiplyList:

doubleList :: [Integer] -> [Integer]
doubleList xs = multiplyList 2 xs

Writing this way, we can clearly cancel out the `xs`:

doubleList = multiplyList 2

This definition style (with no argument variables) is called point-free style. Our definition now says that applying only one argument to multiplyList doesn't fail to evaluate, rather it gives us a more specific function of type [Integer] -> [Integer] instead of finishing with a final [Integer] value.

We now see that functions in Haskell behave much like any other value. Functions can return other functions, and functions can stand alone as objects without mentioning their arguments. Functions seem almost like normal constants. Can we use functions themselves as arguments even? Yes, and that's the key to the next step in generalizing multiplyList. We need a function that takes any other appropriate function and applies the given function to the elements of a list:

applyToIntegers :: (Integer -> Integer) -> [Integer] -> [Integer]
applyToIntegers _ [] = []
applyToIntegers f (n:ns) = (f n) : applyToIntegers f ns

With applyToIntegers, we can take any Integer -> Integer function and apply it to the elements of a list of Integers. We can thus use this generalized function to redefine multiplyList:

multiplyList :: Integer -> [Integer] -> [Integer]
multiplyList m = applyToIntegers ((*) m)

That uses the (*) function with a single initial argument to create a new function which is ready to take one more argument (which, in this use case, will come from the numbers in a given list).

Currying[edit | edit source]

If all this abstraction confuses you, consider a concrete example: When we multiply 5 * 7 in Haskell, the (*) function doesn't just take two arguments at once, it actually first takes the 5 and returns a new 5* function; and that new function then takes a second argument and multiplies that by 5. So, for our example, we then give the 7 as an argument to the 5* function, and that returns us our final evaluated number (35).

So, all functions in Haskell really take only one argument. However, when we know how many intermediate functions we will generate to reach a final result, we can treat functions as if they take many arguments. The number of arguments we generally talk about functions taking is actually the number of one-argument functions we get between the first argument and a final, non-functional result value.

The process of creating intermediate functions when feeding arguments into a complex function is called currying (named after Haskell Curry, also the namesake of the Haskell programming language).

The map function[edit | edit source]

While applyToIntegers has type (Integer -> Integer) -> [Integer] -> [Integer], the definition itself contains nothing specific to integers. To use the same logic with other types of lists, we could define versions such as applyToChars, applyToStrings and so on. They would all have the same definition but different type signatures. We can avoid all that redundancy with the final step in generalizing: making a fully polymorphic version with signature (a -> b) -> [a] -> [b]. Prelude already has this function, and it is called map:

map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = (f x) : map f xs

With map, we can effortlessly implement functions as different as...

multiplyList :: Integer -> [Integer] -> [Integer]
multiplyList m = map ((*) m)

... and...

heads :: [[a]] -> [a]
heads = map head
Prelude> heads [[1,2,3,4],[4,3,2,1],[5,10,15]]
[1,4,5]

map is the general solution for applying a function to each and every element of a list. Our original doubleList problem was simply a specific version of map. Functions like map which take other functions as arguments are called higher-order functions. We will learn about more higher-order functions for list processing in the next chapter.

Exercises
  1. Use map to build functions that, given a list xs of Ints, return:
    • A list that is the element-wise negation of xs.
    • A list of lists of Ints xss that, for each element of xs, contains the divisors of xs. You can use the following function to get the divisors:
      divisors p = [ f | f <- [1..p], p `mod` f == 0 ]
    • The element-wise negation of xss.
  2. Implement a Run Length Encoding (RLE) encoder and decoder.
    • The idea of RLE is simple; given some input:
      "aaaabbaaa"
      compress it by taking the length of each run of characters:(4,'a'), (2, 'b'), (3, 'a')
    • The concat and group functions might be helpful. In order to use group, import the Data.List module by typing :m Data.List at the ghci prompt or by adding import Data.List to your Haskell source code file.
    • What is the type of your encode and decode functions?

Tips and Tricks[edit | edit source]

A few miscellaneous notes about lists in Haskell:

Dot Dot Notation[edit | edit source]

Haskell has a convenient shorthand for writing ordered lists of regularly-spaced integers. Some examples to illustrate it:

Code             Result
----             ------
[1..10]          [1,2,3,4,5,6,7,8,9,10]
[2,4..10]        [2,4,6,8,10]
[5,4..1]         [5,4,3,2,1]
[1,3..10]        [1,3,5,7,9]

The same notation works with characters and even with floating point numbers. Unfortunately, floating-point numbers are problematic due to rounding errors. Try this:

[0,0.1 .. 1]

Note

The .. notation only works with sequences with fixed differences between consecutive elements. For instance, you cannot write...

[0,1,1,2,3,5,8..100]

... and expect to magically get back the rest of the Fibonacci series.[5]


Infinite Lists[edit | edit source]

Thanks to lazy evaluation, Haskell lists can be infinite. For example, the following generates the infinite list of integers starting with 1:

[1..]

(If you try this in GHCi, remember you can stop an evaluation with Ctrl-c).

The same effect could be achieved with a recursive function:

intsFrom n = n : intsFrom (n + 1) -- note there is no base case!
positiveInts = intsFrom 1

Infinite lists are useful in practice because Haskell's lazy evaluation never actually evaluates more than it needs at any given moment. In most cases, we can treat an infinite list like an ordinary one. The program will only go into an infinite loop when evaluation requires all the values in the list. So, we can't sort or print an infinite list, but:

evens = doubleList [1..]

will define "evens" to be the infinite list [2,4,6,8..], and we can then pass "evens" into other functions that only need to evaluate part of the list for their final result. Haskell will know to only use the portion of the infinite list needed in the end.

Compared to hard-coding a long finite list, it's often more convenient to define an infinite list and then take the first few items. An infinite list can also be a handy alternative to the traditional endless loop at the top level of an interactive program.

A note about head and tail[edit | edit source]

Given the choice of using either the ( : ) pattern or head/tail to split lists, pattern matching is almost always preferable. It may be tempting to use head and tail due to simplicity and terseness, but it is too easy to forget that they fail on empty lists (and runtime crashes are never good). We do have a Prelude function, null :: [a] -> Bool, which returns True for empty lists and False otherwise, so that provides a sane way of checking for empty lists without pattern matching; but matching an empty list tends to be cleaner and clearer than the corresponding if-then-else expression using null.

Exercises
  1. With respect to your solutions to the first set of exercises in this chapter, is there any difference between scanSum (takeInt 10 [1..]) and takeInt 10 (scanSum [1..])?
  2. Write functions that, when applied to lists, give the last element of the list and the list with the last element dropped (without using reverse).
    This functionality is provided by Prelude through the last and init functions. Like head and tail, they blow up when given empty lists.

Notes

  1. Had we forgotten the base case, once the recursion got to an empty list the (x:xs) pattern match would fail, and we would get an error.
  2. …as long as none of the calculations result in an error or nontermination, which are not problems in this case.
  3. The compiler may sometimes evaluate things sooner in order to improve efficiency.
  4. One exception is the case of infinite lists (!) which we will consider in a short while.
  5. http://en.wikipedia.org/wiki/Fibonacci_number