Haskell/List processing
From Wikibooks, the open-content textbooks collection
Because lists are such a fundamental data type in Haskell, there is a large collection of standard functions for processing them. These are mostly to be found in a library module called the 'Standard Prelude' which is automatically imported in all Haskell programs. There are also additional list-processing functions to be found in the Data.List module.
[edit] Map
This module will explain one particularly important function, called map, and then describe some of the other list processing functions that work in similar ways.
Recall the multiplyList function from a couple of chapters ago.
multiplyList :: Integer -> [Integer] -> [Integer] multiplyList _ [] = [] multiplyList m (n:ns) = (m*n) : multiplyList m ns
This works on a list of integers, multiplying each item by a constant. But Haskell allows us to pass functions around just as easily as we can pass integers. So instead of passing a multiplier m we could pass a function f, like this:
mapList1 :: (Integer -> Integer) -> [Integer] -> [Integer] mapList1 _ [] = [] mapList1 f (n:ns) = (f n) : mapList1 f ns
Take a minute to compare the two functions. The difference is in the first parameter. Instead of being just an Integer it is now a function. This function parameter has the type (Integer -> Integer), meaning that it is a function from one integer to another. The second line says that if this is applied to an empty list then the result is itself an empty list, and the third line says that for a non-empty list the result is f applied to the first item in the list, followed by a recursive call to mapList1 for the rest of the list.
Remember that (*) has type Integer -> Integer -> Integer. So if we write (2*) then this returns a new function that doubles its argument and has type Integer -> Integer. But that is exactly the type of functions that can be passed to mapList1. So now we can write doubleList like this:
doubleList = mapList1 (2*)
We could also write it like this, making all the arguments explicit:
doubleList ns = mapList1 (2*) ns
(The two are equivalent because if we pass just one argument to mapList1 we get back a new function. The second version is more natural for newcomers to Haskell, but experts often favour the first, known as 'point free' style.)
Obviously this idea is not limited to just integers. We could just as easily write
mapListString :: (String -> String) -> [String] -> [String] mapListString _ [] = [] mapListString f (n:ns) = (f n) : mapListString f ns
and have a function that does this for strings. But this is horribly wasteful: the code is exactly the same for both strings and integers. What is needed is a way to say that mapList works for both Integers, Strings, and any other type we might want to put in a list. In fact there is no reason why the input list should be the same type as the output list: we might very well want to convert a list of integers into a list of their string representations, or vice versa. And indeed Haskell provides a way to do this. The Standard Prelude contains the following definition of map:
map :: (a -> b) -> [a] -> [b] map _ [] = [] map f (x:xs) = (f x) : map f xs
Instead of constant types like String or Integer this definition uses type variables. These start with lower case letters (as opposed to type constants that start with upper case) and otherwise follow the same lexical rules as normal variables. However the convention is to start with "a" and go up the alphabet. Even the most complicated functions rarely get beyond "d".
So what this says is that map takes two parameters:
- A function from a thing of type
ato a thing of typeb. - A list of things of type
a.
Then it returns a new list containing things of type b, constructed by applying the function to each element of the input list.
| Exercises |
|---|
|
[edit] Folds
A fold applies a function to a list in a way similar to map, but accumulates a single result instead of a list.
Take for example, a function like sum, which might be implemented as follows:
or product:
or concat, which takes a list of lists and joins (concatenates) them into one:
There is a certain pattern of recursion common to all of these examples. This pattern is known as a fold, possibly from the idea that a list is being "folded up" into a single value, or that a function is being "folded between" the elements of the list.
The Standard Prelude defines four fold functions: foldr, foldl, foldr1 and foldl1.
[edit] foldr
The most natural and commonly used of these in a lazy language like Haskell is the right-associative foldr:
foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)
The first argument is a function with two arguments, the second is a "zero" value for the accumulator, and the third is the list to be folded.
For example, in sum, f is (+), and z is 0, and in concat, f is (++) and z is []. In many cases, like all of our examples so far, the function passed to a fold will have both its arguments be of the same type, but this is not necessarily the case in general.
What foldr f z xs does is to replace each cons (:) in the list xs with the function f, and the empty list at the end with z. That is,
a : b : c : []
becomes
f a (f b (f c z))
This is perhaps most elegantly seen by picturing the list data structure as a tree:
: f
/ \ / \
a : foldr f z a f
/ \ -------------> / \
b : b f
/ \ / \
c [] c z
It is fairly easy to see with this picture that foldr (:) [] is just the identity function on lists.
[edit] foldl
The left-associative foldl processes the list in the opposite direction:
foldl :: (a -> b -> a) -> a -> [b] -> a foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs
So brackets in the resulting expression accumulate on the left. Our list above, after being transformed by foldl f z becomes:
f (f (f z a) b) c
The corresponding trees look like:
: f
/ \ / \
a : foldl f z f c
/ \ -------------> / \
b : f b
/ \ / \
c [] z a
Technical Note: The left associative fold is tail-recursive, that is, it recurs immediately, calling itself. For this reason the compiler will optimise it to a simple loop, and it will then be much more efficient than foldr. However, Haskell is a lazy language, and so the calls to f will by default be left unevaluated, building up an expression in memory whose size is linear in the length of the list, exactly what we hoped to avoid in the first place. To get back this efficiency, there is a version of foldl which is strict, that is, it forces the evaluation of f immediately, called foldl'. Note the single quote character: this is pronounced "fold-ell-tick". A tick is a valid character in Haskell identifiers. foldl' can be found in the library Data.List. As a rule you should use foldr on lists that might be infinite or where the fold is building up a data structure, and foldl' if the list is known to be finite and comes down to a single value. foldl (without the tick) should rarely be used at all.
[edit] foldr1 and foldl1
As previously noted, the type declaration for foldr makes it quite possible for the list elements and result to be of different types. For example, "read" is a function that takes a string and converts it into some type (the type system is smart enough to figure out which one). In this case we convert it into a float.
Example: The list elements and results can have different types
addStr :: String -> Float -> Float addStr str x = read str + x sumStr :: [String] -> Float sumStr = foldr addStr 0.0
If you substitute the types Float and String for the type variables a and b in the type of foldr you will see that this is type correct.
There is also a variant called foldr1 ("fold - arr - one") which dispenses with an explicit zero by taking the last element of the list instead:
foldr1 :: (a -> a -> a) -> [a] -> a foldr1 f [x] = x foldr1 f (x:xs) = f x (foldr1 f xs) foldr1 _ [] = error "Prelude.foldr1: empty list"
And foldl1 as well:
foldl1 :: (a -> a -> a) -> [a] -> a foldl1 f (x:xs) = foldl f x xs foldl1 _ [] = error "Prelude.foldl1: empty list"
Note: There is additionally a strict version of foldl1 called foldl1' in the Data.List library.
Notice that in this case all the types have to be the same, and that an empty list is an error. These variants are occasionally useful, especially when there is no obvious candidate for z, but you need to be sure that the list is not going to be empty. If in doubt, use foldr or foldl'.
[edit] folds and laziness
One good reason that right-associative folds are more natural to use in Haskell than left-associative ones is that right folds can operate on infinite lists, which are not so uncommon in Haskell programming. If the input function f only needs its first parameter to produce the first part of the output, then everything works just fine. However, a left fold must call itself recursively until it reaches the end of the input list; this is inevitable, because the recursive call is not made in an argument to f. Needless to say, this never happens if the input list is infinite, and the program will spin endlessly in an infinite loop.
As a toy example of how this can work, consider a function echoes taking a list of integers, and producing a list where if the number n occurs in the input list, then n replicated n times will occur in the output list. We will make use of the prelude function replicate: replicate n x is a list of length n with x the value of every element.
We can write echoes as a foldr quite handily:
echoes = foldr (\x xs -> (replicate x x) ++ xs) []
or, equally handily, as a foldl:
echoes = foldl (\xs x -> xs ++ (replicate x x)) []
but only the first definition works on an infinite list like [1..]. Try it! (If you try this in GHCi, remember you can stop an evaluation with Ctrl-c).
Note the syntax in the above example: the \xs x -> means that xs is set to the first argument outside the parentheses (in this case, []), and x is set to the second (will end up being the argument of echoes when it is called).
As a final example, another thing that you might notice is that map itself is patterned as a fold:
map f = foldr (\x xs -> f x : xs) []
Folding takes a little time to get used to, but it is a fundamental pattern in functional programming, and eventually becomes very natural. Any time you want to traverse a list and build up a result from its members, you likely want a fold.
| Exercises |
|---|
|
Define the following functions recursively (like the definitions for
Define the following functions using
|
[edit] Scans
A "scan" is much like a cross between a map and a fold. Folding a list accumulates a single return value, whereas mapping puts each item through a function with no accumulation. A scan does both: it accumulates a value like a fold, but instead of returning a final value it returns a list of all the intermediate values.
The Standard Prelude contains four scan functions:
scanl :: (a -> b -> a) -> a -> [b] -> [a]
This accumulates the list from the left, and the second argument becomes the first item in the resulting list. So scanl (+) 0 [1,2,3] = [0,1,3,6]
scanl1 :: (a -> a -> a) -> [a] -> [a]
This is the same as scanl, but uses the first item of the list as a zero parameter. It is what you would typically use if the input and output items are the same type. Notice the difference in the type signatures. scanl1 (+) [1,2,3] = [1,3,6].
scanr :: (a -> b -> b) -> b -> [a] -> [b] scanr1 :: (a -> a -> a) -> [a] -> [a]
These two functions are the exact counterparts of scanl and scanl1. They accumulate the totals from the right. So:
scanr (+) 0 [1,2,3] = [6,5,3,0] scanr1 (+) [1,2,3] = [6,5,3]
| Exercises |
|---|
|
Define the following functions:
|