Haskell/List processing
Folds[edit]
A fold is a higher order function that, like map, takes a function and a list. However, instead of applying the function element by element, the fold uses it to combine the list elements into a result value.
Let's start looking at a few concrete examples. A function like sum might be implemented as follows:
Example: sum
sum :: [Integer] > Integer
sum [] = 0
sum (x:xs) = x + sum xs
or product:
Example: product
product :: [Integer] > Integer
product [] = 1
product (x:xs) = x * product xs
or concat, which takes a list of lists and joins (concatenates) them into one:
Example: concat
concat :: [[a]] > [a]
concat [] = []
concat (x:xs) = x ++ concat xs
There is a certain pattern of recursion common to all of these examples. This pattern is known as a fold, possibly from the idea that a list is being "folded up" into a single value, or that a function is being "folded between" the elements of the list.
The Standard Prelude defines four fold
functions: foldr
, foldl
, foldr1
and foldl1
.
foldr[edit]
The rightassociative foldr
folds up a list from the right to left. As it proceeds, foldr uses the given function to combine each of the element with the running value called the accumulator. When calling foldr, the initial value of the accumulator is set as an argument.
foldr :: (a > b > b) > b > [a] > b
foldr f acc [] = acc
foldr f acc (x:xs) = f x (foldr f acc xs)
The first argument is a function with two arguments, the second is a "zero" value for the accumulator, and the third is the list to be folded.
In sum
, f
is (+)
, and acc
is 0
, and in concat
, f
is (++)
and acc
is []
. In many cases, like all of our examples so far, the function passed to a fold will have both its arguments be of the same type, but this is not necessarily the case in general.
What foldr f acc xs does is to replace each cons (:) in the xs
list with the function f
, and the empty list at the end with acc
.
a : b : c : []
becomes
f a (f b (f c acc))
Note how the parentheses nest around the right end of the list.
An elegant visualisation is given by picturing the list data structure as a tree:
: f / \ / \ a : foldr f acc a f / \ > / \ b : b f / \ / \ c [] c acc
We can see here that foldr (:) [] would return the list completely unchanged. That sort of function that has no effect is called an identity function. You should start building a habit of looking for identity functions in different cases, and we'll discuss them more later when we learn about monoids.
foldl[edit]
The leftassociative foldl processes the list in the opposite direction, starting at the left side with the first element.
foldl :: (a > b > a) > a > [b] > a
foldl f acc [] = acc
foldl f acc (x:xs) = foldl f (f acc x) xs
So brackets in the resulting expression accumulate around the left end of the list. Our list above, after being transformed by foldl f acc becomes:
f (f (f acc a) b) c
The corresponding trees look like:
: f / \ / \ a : foldl f acc f c / \ > / \ b : f b / \ / \ c [] acc a
Because all folds include both left and right elements, beginners can get confused by the names. You could think of foldr as short for foldrighttoleft and foldl as foldlefttoright. The names refer to where the fold starts.
Note
Technical Note: foldl is tailrecursive, that is, it recurses immediately, calling itself. For this reason the compiler will optimise it to a simple loop, which is a good thing performancewise. However, Haskell is a lazy language, and so the calls to f will be left unevaluated by default, thus building up an unevaluated expression in memory that includes the entire length of the list. To avoid running out of memory, there is a version of foldl called foldl' that is strict in that it forces the evaluation of f immediately at each step.
An apostrophe at the end of a function name is pronounced "tick" as in "foldLtick". A tick is a valid character in Haskell identifiers. foldl' can be found in the Data.List
library module (which can be imported by adding import Data.List to the beginning of a source file). As a rule of thumb, you should use foldr on lists that might be infinite or where the fold is building up a data structure and use foldl' if the list is known to be finite and comes down to a single value. There is almost never a good reason to use foldl
(without the tick), though it might just work if the lists fed to it are not too long.
foldr1 and foldl1[edit]
As previously noted, the type declaration for foldr makes it quite possible for the list elements and result to be of different types. For example, "read" is a function that takes a string and converts it into some type (the type system is smart enough to figure out which one). In this case we convert it into a float.
Example: The list elements and results can have different types
addStr :: String > Float > Float
addStr str x = read str + x
sumStr :: [String] > Float
sumStr = foldr addStr 0.0
If you substitute the types Float and String for the type variables a
and b
in the type of foldr you will see that this is type correct.
There is also a variant called foldr1 ("fold  R  one") which dispenses with an explicit zero by taking the last element of the list instead:
foldr1 :: (a > a > a) > [a] > a
foldr1 f [x] = x
foldr1 f (x:xs) = f x (foldr1 f xs)
foldr1 _ [] = error "Prelude.foldr1: empty list"
And foldl1 as well:
foldl1 :: (a > a > a) > [a] > a
foldl1 f (x:xs) = foldl f x xs
foldl1 _ [] = error "Prelude.foldl1: empty list"
Note: Just like for foldl, the Data.List library includes foldl1' as a strict version of foldl1.
With foldl1 and foldr1, all the types have to be the same and an empty list is an error. These variants are useful when there is no obvious candidate for the initial accumulator value and we are sure that the list is not going to be empty. When in doubt, stick with foldr or foldl'.
folds and laziness[edit]
One reason that rightassociative folds are more natural to use in Haskell than leftassociative ones is that right folds can operate on infinite lists. A fold that returns an infinite list is perfectly usable in a larger context that doesn't need to access the entire infinite result. In that case, foldr can move along as much as is needed and the compiler will not do any extra. However, a left fold necessarily calls itself recursively until it reaches the end of the input list (because the recursive call is not made in an argument to f). Needless to say, no end will be reached if an input list to foldl is infinite.
As a toy example, consider a function echoes
that takes a list of integers and produces a list such that wherever the number n occurs in the input list, it is replicated n times in the output list. To create our echoes function, we will use the prelude function replicate
in which replicate n x is a list of length n with x the value of every element.
We can write echoes as a foldr quite handily:
echoes = foldr (\ x xs > (replicate x x) ++ xs) []
(Note: This definition is very compact thanks to the \ x xs >
syntax. Instead of defining a function somewhere else and passing it to foldr we provided the definition in situ. x
and xs
being the arguments and the righthand side of the definition being what is after the >
.)
Or, equally handily, as a foldl:
echoes = foldl (\xs x > xs ++ (replicate x x)) []
but only the foldr version works on an infinite list like [1..]
. Try it! (If you try this in GHCi, remember you can stop an evaluation with Ctrlc, but you have to be quick and keep an eye on the system monitor or your memory will be consumed in no time and your system will hang.)
As a final example, another thing that you might notice is that map
itself can be implemented as a fold:
map f = foldr (\x xs > f x : xs) []
Folding takes a little time to get used to, but it is a fundamental pattern in functional programming and eventually becomes very natural. Any time you want to traverse a list and build up a result from its members, you likely want a fold.
Exercises 


Scans[edit]
A "scan" is much like a cross between a map
and a fold. Folding a list accumulates a single return value, whereas mapping puts each item through a function with no accumulation. A scan does both: it accumulates a value like a fold, but instead of returning a final value it returns a list of all the intermediate values.
The Standard Prelude contains four scan functions:
scanl :: (a > b > a) > a > [b] > [a]
This accumulates the list from the left, and the second argument becomes the first item in the resulting list. So scanl (+) 0 [1,2,3] = [0,1,3,6]
.
scanl1 :: (a > a > a) > [a] > [a]
This is the same as scanl
, but uses the first item of the list as a zero parameter. It is what you would typically use if the input and output items are the same type. Notice the difference in the type signatures. scanl1 (+) [1,2,3] = [1,3,6]
.
scanr :: (a > b > b) > b > [a] > [b]
scanr1 :: (a > a > a) > [a] > [a]
These two functions are the exact counterparts of scanl
and scanl1
. They accumulate the totals from the right. So:
scanr (+) 0 [1,2,3] = [6,5,3,0]
scanr1 (+) [1,2,3] = [6,5,3]
Exercises 


filter[edit]
A very common operation performed on lists is filtering, which means generating a new list composed only of elements of the first list that meet a certain condition. One simple example of that would be taking a list of integers and making from it a list which only retains its even numbers.
retainEven :: [Int] > [Int]
retainEven [] = []
retainEven (n:ns) =
 mod n 2 computes the remainder for the integer division of n by 2, so if it is zero the number is even
if (mod n 2) == 0
then n : (retainEven ns)
else retainEven ns
That works fine, but it is a slightly verbose solution. It would be nice to have a more concise and general way to write the filter function. This is already provided by Prelude as filter with the following type signature:
filter :: (a > Bool) > [a] > [a]
So, a (a > Bool) function tests an elements for the condition, there is a list to be filtered, and it returns the filtered list. In order to write retainEven using filter, we need to state the condition as an auxiliary (a > Bool) function, like this one:
isEven :: Int > Bool
isEven n = (mod n 2) == 0
And then retainEven becomes simply:
retainEven ns = filter isEven ns
We used ns instead of xs just because it is sensible to indicate that we know these are numbers and not just anything, but we can ignore that and use a more terse pointfree definition:
retainEven = filter isEven
This is just like what we demonstrated before for map and the folds. Like filter, those take another function as argument; and using them pointfree emphasizes this "functionsoffunctions" aspect.
List comprehensions[edit]
A powerful, concise, and expressive syntactic tool for list processing is the list comprehension. Among other things, list comprehensions can be syntactic sugar for filtering. Instead of using the Prelude filter, we could write retainEven like this:
retainEven es = [n  n < es, isEven n]
This compact syntax may look a bit intimidating at first, but it is simple to break down. One interpretation is:
 (Starting from the middle) Take the list es and draw (the "<") each of its elements as a value n.
 (After the comma) For each drawn n test the boolean condition
isEven n
.  (Before the vertical bar) If (and only if) the boolean condition is satisfied, append n to the new list being created (note the square brackets around the whole expression).
Thus, if es
is equal to [1,2,3,4], then we would get back the list [2,4]. 1 and 3 were not drawn because (isEven n) == False
.
The power of list comprehensions comes from being easily extensible. Firstly, we can use as many tests as we wish (even zero!). Multiple conditions are written as a commaseparated list of expressions (which should evaluate to a Boolean, of course). For a simple example, suppose we want to modify retainEven so that only numbers larger than 100 are retained:
retainLargeEvens :: [Int] > [Int]
retainLargeEvens es = [n  n < es, isEven n, n > 100]
Furthermore, we are not limited to using n as the element to be appended when generating a new list. Instead, we could place any expression before the vertical bar (if it is compatible with the type of the list, of course). For instance, if we wanted to subtract one from every even number, all it would take is:
evensMinusOne es = [n  1  n < es, isEven n]
In effect, that means the list comprehension syntax incorporates the functionality of map as well as of filter. Now that is concise (and still readable too)!
To further sweeten things, the left arrow notation in list comprehensions can be combined with pattern matching. For example, suppose we had a list of (Int, Int)
tuples and we would like to construct a list with the first element of every tuple whose second element is even. Using list comprehensions, we might write it as follows:
firstForEvenSeconds :: [(Int, Int)] > [Int]
firstForEvenSeconds ps = [fst p  p < ps, isEven (snd p)]  here, p is for pairs.
Patterns can make it much more readable:
firstForEvenSeconds ps = [x  (x, y) < ps, isEven y]
As in other cases, arbitrary expressions may be used before the . If we wanted a list with the double of those first elements:
doubleOfFirstForEvenSeconds :: [(Int, Int)] > [Int]
doubleOfFirstForEvenSeconds ps = [2 * x  (x, y) < ps, isEven y]
Not counting spaces, that function code is shorter than its descriptive name!
There are even more possible tricks:
allPairs :: [(Int, Int)]
allPairs = [(x, y)  x < [1..4], y < [5..8]]
This comprehension draws from two lists, and generates all possible (x, y)
pairs with the first element in [1..4]
and the second in [5..8]
. In the final list of pairs, the first elements will be those generated with the first element of the first list (here, 1
), then those with the second element of the first list, and so on. In this example, the full list is (linebreaks added for clarity):
Prelude> [(x, y)  x < [1..4], y < [5..8]] [(1,5),(1,6),(1,7),(1,8), (2,5),(2,6),(2,7),(2,8), (3,5),(3,6),(3,7),(3,8), (4,5),(4,6),(4,7),(4,8)]
Note that we didn't do any filtering here; but we could easily add a condition to restrict the combinations that go into the final list:
somePairs = [(x, y)  x < [1..4], y < [5..8], x + y > 8]
This list only has the pairs with the sum of elements larger than 8; starting with (1,8)
, then (2,7)
and so forth.
Exercises 

