Haskell/List processing

From Wikibooks, open books for an open world
Jump to: navigation, search



Folds[edit]

A fold applies a function to a list in a way similar to map, but accumulates a single result instead of a list.

Take, for example, a function like sum, which might be implemented as follows:

Example: sum

 sum :: [Integer] -> Integer
 sum []     = 0
 sum (x:xs) = x + sum xs

or product:

Example: product

 product :: [Integer] -> Integer
 product []     = 1
 product (x:xs) = x * product xs

or concat, which takes a list of lists and joins (concatenates) them into one:

Example: concat

 concat :: [[a]] -> [a]
 concat []     = []
 concat (x:xs) = x ++ concat xs

There is a certain pattern of recursion common to all of these examples. This pattern is known as a fold, possibly from the idea that a list is being "folded up" into a single value, or that a function is being "folded between" the elements of the list.

The Standard Prelude defines four fold functions: foldr, foldl, foldr1 and foldl1.

foldr[edit]

The right-associative foldr folds up a list from the right, that is, it walks from the last to the first element of the list and applies the given function to each of the elements and the accumulator, the initial value of which has to be set:

foldr            :: (a -> b -> b) -> b -> [a] -> b
foldr f acc []     = acc
foldr f acc (x:xs) = f x (foldr f acc xs)

The first argument is a function with two arguments, the second is a "zero" value for the accumulator, and the third is the list to be folded.

For example, in sum, f is (+), and acc is 0, and in concat, f is (++) and acc is []. In many cases, like all of our examples so far, the function passed to a fold will have both its arguments be of the same type, but this is not necessarily the case in general.

What foldr f acc xs does is to replace each cons (:) in the list xs with the function f, and the empty list at the end with acc. That is,

a : b : c : []

becomes

f a (f b (f c acc))

This is perhaps most elegantly seen by picturing the list data structure as a tree:

  :                         f
 / \                       / \
a   :       foldr f acc   a   f
   / \    ------------->     / \
  b   :                     b   f
     / \                       / \
    c  []                     c   acc

It is fairly easy to see with this picture that foldr (:) [] is just the identity function on lists (that is, the function which returns its argument unmodified).

foldl[edit]

The left-associative foldl processes the list in the opposite direction, starting at the left side with the first element, and proceeding to the last one step by step:

foldl            :: (a -> b -> a) -> a -> [b] -> a
foldl f acc []     =  acc
foldl f acc (x:xs) =  foldl f (f acc x) xs

So brackets in the resulting expression accumulate on the left. Our list above, after being transformed by foldl f z becomes:

f (f (f acc a) b) c

The corresponding trees look like:

  :                            f
 / \                          / \
a   :       foldl f acc      f   c
   / \    ------------->    / \
  b   :                    f   b 
     / \                  / \
    c  []                acc a

Note

Technical Note: The left associative fold is tail-recursive, that is, it recurses immediately, calling itself. For this reason the compiler will optimise it to a simple loop, and it will then be much more efficient than foldr. However, Haskell is a lazy language, and so the calls to f will by default be left unevaluated, building up an expression in memory whose size is linear in the length of the list, exactly what we hoped to avoid in the first place. To get back this efficiency, there is a version of foldl which is strict, that is, it forces the evaluation of f immediately, called foldl'. Note the single quote character: this is pronounced "fold-ell-tick". A tick is a valid character in Haskell identifiers. foldl' can be found in the library Data.List (which can be imported by adding import Data.List to the beginning of a source file). As a rule of thumb you should use foldr on lists that might be infinite or where the fold is building up a data structure, and foldl' if the list is known to be finite and comes down to a single value. foldl (without the tick) should rarely be used at all, unless the list is not too long, or memory usage isn't a problem.


foldr1 and foldl1[edit]

As previously noted, the type declaration for foldr makes it quite possible for the list elements and result to be of different types. For example, "read" is a function that takes a string and converts it into some type (the type system is smart enough to figure out which one). In this case we convert it into a float.

Example: The list elements and results can have different types

addStr :: String -> Float -> Float
addStr str x = read str + x
 
sumStr :: [String] -> Float
sumStr = foldr addStr 0.0

If you substitute the types Float and String for the type variables a and b in the type of foldr you will see that this is type correct.

There is also a variant called foldr1 ("fold - arr - one") which dispenses with an explicit zero by taking the last element of the list instead:

foldr1           :: (a -> a -> a) -> [a] -> a
foldr1 f [x]     =  x
foldr1 f (x:xs)  =  f x (foldr1 f xs)
foldr1 _ []      =  error "Prelude.foldr1: empty list"

And foldl1 as well:

foldl1           :: (a -> a -> a) -> [a] -> a
foldl1 f (x:xs)  =  foldl f x xs
foldl1 _ []      =  error "Prelude.foldl1: empty list"

Note: There is additionally a strict version of foldl1 called foldl1' in the Data.List library.

Notice that in this case all the types have to be the same, and that an empty list is an error. These variants are occasionally useful, especially when there is no obvious candidate for z, but you need to be sure that the list is not going to be empty. If in doubt, use foldr or foldl'.

folds and laziness[edit]

One good reason that right-associative folds are more natural to use in Haskell than left-associative ones is that right folds can operate on infinite lists, which are not so uncommon in Haskell programming. If the input function f only needs its first parameter to produce the first part of the output, then everything works just fine. However, a left fold must call itself recursively until it reaches the end of the input list; this is inevitable, because the recursive call is not made in an argument to f. Needless to say, this never happens if the input list is infinite and the program will spin endlessly in an infinite loop.

As a toy example of how this can work, consider a function echoes taking a list of integers and producing a list where if the number n occurs in the input list, then n replicated n times will occur in the output list. We will make use of the prelude function replicate: replicate n x is a list of length n with x the value of every element.

We can write echoes as a foldr quite handily:

echoes = foldr (\x xs -> (replicate x x) ++ xs) []

(Note: This is very compact thanks to the  \x xs ->  syntax. Instead of defining a function somewhere else and passing it to foldr we provided the definition in situ; x and xs being the arguments and the right-hand side of the definition being what is after the ->)

or, equally handily, as a foldl:

echoes = foldl (\xs x -> xs ++ (replicate x x)) []

but only the first definition works on an infinite list like [1..]. Try it! (If you try this in GHCi, remember you can stop an evaluation with Ctrl-c, but you have to be quick and keep an eye on the system monitor or your memory will be consumed in no time and your system will hang.)

As a final example, another thing that you might notice is that map itself can be implemented as a fold:

map f = foldr (\x xs -> f x : xs) []

Folding takes a little time to get used to, but it is a fundamental pattern in functional programming and eventually becomes very natural. Any time you want to traverse a list and build up a result from its members, you likely want a fold.

Exercises
  1. Define the following functions recursively (like the definitions for sum, product and concat above), then turn them into a fold:
    • and :: [Bool] -> Bool, which returns True if a list of Bools are all True, and False otherwise.
    • or :: [Bool] -> Bool, which returns True if any of a list of Bools are True, and False otherwise.
  2. Define the following functions using foldl1 or foldr1:
    • maximum :: Ord a => [a] -> a, which returns the maximum element of a list (hint: max :: Ord a => a -> a -> a returns the maximum of two values).
    • minimum :: Ord a => [a] -> a, which returns the minimum element of a list (hint: min :: Ord a => a -> a -> a returns the minimum of two values).
  3. Use a fold (which one?) to define reverse :: [a] -> [a], which returns a list with the elements in reverse order.
Note that all of these are Prelude functions, so they will be always close at hand when you need them. (Also, that means you will want to use slightly different names for testing your answers in GHCi.)

Scans[edit]

A "scan" is much like a cross between a map and a fold. Folding a list accumulates a single return value, whereas mapping puts each item through a function with no accumulation. A scan does both: it accumulates a value like a fold, but instead of returning a final value it returns a list of all the intermediate values.

The Standard Prelude contains four scan functions:

scanl   :: (a -> b -> a) -> a -> [b] -> [a]

This accumulates the list from the left, and the second argument becomes the first item in the resulting list. So scanl (+) 0 [1,2,3] = [0,1,3,6].

scanl1  :: (a -> a -> a) -> [a] -> [a]

This is the same as scanl, but uses the first item of the list as a zero parameter. It is what you would typically use if the input and output items are the same type. Notice the difference in the type signatures. scanl1 (+) [1,2,3] = [1,3,6].

scanr   :: (a -> b -> b) -> b -> [a] -> [b]
scanr1  :: (a -> a -> a) -> [a] -> [a]

These two functions are the exact counterparts of scanl and scanl1. They accumulate the totals from the right. So:

scanr (+) 0 [1,2,3] = [6,5,3,0]
scanr1 (+) [1,2,3] = [6,5,3]
Exercises
  1. Write your own definition of scanr, first using recursion, and then using foldr. Do the same for scanl first using recursion then foldl.
  2. Define the following functions:
    • factList :: Integer -> [Integer], which returns a list of factorials from 1 up to its argument. For example, factList 4 = [1,2,6,24].
More to be added

filter[edit]

A very common operation performed on lists is filtering, which is generating a new list composed only of elements of the first list that meet a certain condition. One simple example of that would be taking a list of integers and making from it a list which only retains its even numbers.

retainEven :: [Int] -> [Int]
retainEven [] = []
retainEven (n:ns) =
-- mod n 2 computes the remainder for the integer division of n by 2, so if it is zero the number is even
  if ((mod n 2) == 0)
    then n : (retainEven ns)
    else retainEven ns

That works fine, but it is a slightly verbose solution. It would be nice to have a more concise way to write the filter function. Also, it would certainly be very useful to be able to generalize the filtering operation – that is, make it capable of filtering a list using any boolean condition we'd like. To help us with both of these issues Prelude provides a filter function. filter has the following type signature:

filter :: (a -> Bool) -> [a] -> [a]

That means it evaluates to a list when given two arguments, namely an (a -> Bool) function which carries the actual test of the condition for the elements of the list and the list to be filtered. In order to write retainEven using filter, we need to state the condition as an auxiliary (a -> Bool) function, like this one:

isEven :: Int -> Bool 
isEven n = ((mod n 2) == 0)

And then retainEven becomes simply:

retainEven ns = filter isEven ns

It can be made even more terse by writing it in point-free style:

retainEven = filter isEven

This is just like what we demonstrated before for map and the folds. Like filter, those take another function as argument; and using them point-free emphasizes this "functions-of-functions" aspect.

List comprehensions[edit]

An additional tool for list processing is the list comprehension, a powerful, concise and expressive syntactic construct. One simple way we can use list comprehensions is as syntactic sugar for filtering. So, instead of using the Prelude filter, we could write retainEven like this:

retainEven es = [ n | n <- es , isEven n ]

This compact syntax may look a bit intimidating at first, but it is simple to break down. One possible way to read it would be:

  • (Starting from the middle) Take the list es and draw (the "<-") each of its elements as a value n.
  • (After the comma) For each drawn n test the boolean condition isEven n.
  • (Before the vertical bar) If (and only if) the boolean condition is satisfied, prepend n to the new list being created (note the square brackets around the whole expression).

Thus if es is equal to [1,2,3,4], then we would get back the list [2,4]. 1 and 3 were not drawn because (isEven n) == False .

The real power of list comprehensions, though, comes from the fact they are easily extensible. Firstly, we can use as many tests as we wish (even zero!). Multiple conditions are written as a comma-separated list of expressions (which should evaluate to a Boolean, of course). For a simple example, suppose we want to modify retainEven so that only numbers larger than 100 are retained:

retainLargeEvens :: [Int] -> [Int]
retainLargeEvens es = [ n | n <- es , isEven n, n > 100 ]

Furthermore, we are not limited to using n as the element to be prepended when generating a new list. Instead, we could place any expression before the vertical bar (if it is compatible with the type of the list, of course). For instance, if we wanted to subtract one from every even number, all it would take is:

evensMinusOne es = [ n - 1 | n <- es , isEven n ]

In effect, that means the list comprehension syntax incorporates the functionality of map as well as of filter. Now that is conciseness! (and conciseness that does not sacrifice readability, in that.)

To further sweeten things, the left arrow notation in list comprehensions can be combined with pattern matching. For example, suppose we had a list of (Int, Int) tuples and we would like to construct a list with the first element of every tuple whose second element is even. Using list comprehensions, we might write it as follows:

firstForEvenSeconds :: [(Int, Int)] -> [Int]
firstForEvenSeconds ps = [ fst p | p <- ps, isEven (snd p) ] -- here, p is for pairs.

Patterns can make what the function is doing more obvious:

firstForEvenSeconds ps = [ x | (x,y) <- ps, isEven y ]

As in other cases, arbitrary expressions may be used before the |. If we wanted a list with the double of those first elements:

doubleOfFirstForEvenSeconds :: [(Int, Int)] -> [Int]
doubleOfFirstForEvenSeconds ps = [ 2 * x | (x,y) <- ps, isEven y ]

Note that the function code is actually shorter than its descriptive name.

There are even more possible tricks:

allPairs :: [(Int, Int)]
allPairs = [ (x, y) | x <- [1..4], y <- [5..8] ]

This comprehension draws from two lists, and generates all 4 * 4 = 16 possible (x, y) pairs with the first element in [1..4] and the second in [5..8]. In the final list of pairs, the first elements will be those generated with the first element of the first list (here, 1), then those with the second element of the first list, and so on. In this example, the full list is (linebreaks added for clarity):

Prelude> [(x,y)|x<-[1..4],y<-[5..8]]
[(1,5),(1,6),(1,7),(1,8),
 (2,5),(2,6),(2,7),(2,8),
 (3,5),(3,6),(3,7),(3,8),
 (4,5),(4,6),(4,7),(4,8)]

Note that we didn't do any filtering here; but we could easily add a condition to restrict the combinations that go into the final list:

somePairs = [ (x, y) | x <- [1..4], y <- [5..8], x + y > 8 ]

This lists only has the pairs with the sum of elements larger than 8; starting with (1,8), then (2,7) and so forth.

Exercises
  1. Write a returnDivisible :: Int -> [Int] -> [Int] function which filters a list of integers retaining only the numbers divisible by the integer passed as first argument. For integers x and n, x is divisible by n if (mod x n) == 0 (note that the test for evenness is a specific case of that).
    • Write - using list comprehension syntax, a single function definition and no if, case and similar constructs - a [[Int]] -> [[Int]] which, from a list of lists of Int, returns a list of the tails of those lists using, as filtering condition, that the head of each [Int] must be larger than 5. Also, your function must not trigger an error when it meets an empty [Int], so you'll need to add an additional test to detect emptiness.
    • Does order matter when listing the conditions for list comprehension? (You can find it out by playing with the function you wrote for the first part of the exercise.)
  2. Over this section we've seen how list comprehensions are essentially syntactic sugar for filter and map. Now work in the opposite direction and define alternative versions of the filter and map using the list comprehension syntax.
  3. Rewrite doubleOfFirstForEvenSeconds using filter and map instead of list comprehension.