Haskell/Pattern matching

From Wikibooks, open books for an open world
Jump to: navigation, search

Over the previous modules of this book we occasionally mentioned pattern matching, pointing out some common situations in which it was involved without providing a clear definition. Now that you have some familiarity with the fundamental language constructs, it is a proper time for a more formal discussion of pattern matching. Let us start with a possible one-line definition:

Pattern matching is a convenient way to bind variables to different parts of a given value.

Note

Pattern matching on what?

Some languages like Perl and Python use the term pattern matching for matching regular expressions against strings. The pattern matching we are referring to in this chapter is something completely different. In fact, you're probably best off forgetting what you know about pattern matching for now.[1] Here, pattern matching is used in the same way as in other ML-like languages: to deconstruct values according to their type specification.


[edit] What is pattern matching?

Actually, you have seen many examples of pattern matching already – it is virtually everywhere. For instance, pick a function definition like that of map:

map _ []     = []
map f (x:xs) = f x : map f xs

Here there are four different patterns involved, two per equation. Let's explore each one in turn:

  • f is a pattern which matches anything at all, and binds the f variable to that something.
  • (x:xs) is a pattern that matches a non-empty list which is formed by something (which gets bound to the x variable) which was cons'd (by the function (:)) onto something else (which gets bound to xs). xs can be an empty list.
  • [] is a pattern that matches the empty list. It doesn't bind any variables.
  • _ is the pattern which matches anything at all, but doesn't do any binding.

So pattern matching is a way of:

  • recognizing values. For instance, when map is called and the second argument matches [] the first definition of map is used instead of the second one.
  • binding variables to the recognized values. In this case, the variables f, x and xs are assigned to the values passed as arguments to map when the second definition is used. Not all patterns do binding – for example, _ is used as a "whatever/don't care" wildcard exactly because it doesn't bind variables, and so it is not possible to refer to any value it matches.
  • breaking down values into parts, as the (x:xs) pattern does by assigning two variables to parts (head and tail) of a single argument (the list).

At this point, the process of using a function to break down a value by effectively undoing its effects may look a little bit too magical. It is not, however, a universally deployable technique. For example, one might think of, by analogy to how (:) is used to break down a list, define a function which uses (++) to chop off the first three elements of a list:

dropThree ([x,y,z] ++ xs) = xs

However, that will not work. The function (++) isn't allowed in patterns, and in fact most other functions that act on lists wouldn't be allowed either. So which functions are allowed?

The one-word answer is constructors – the functions used to build values of algebraic data types. Let us consider a generic example:

data Foo = Bar | Baz Int

Here Bar and Baz are constructors for the type Foo. And so you can use them for pattern matching Foo values, and bind variables to the Int value contained in a Foo constructed with Baz:

f :: Foo -> Int
f Bar     = 1
f (Baz x) = x - 1

That was exactly what was going on back when we defined showAnniversary and showDate in the Type declarations module. For instance:

data Date = Date Int Int Int   -- Year, Month, Day
showDate :: Date -> String
showDate (Date y m d) = show y ++ "-" ++ show m ++ "-" ++ show d

The (Date y m d) pattern in the left-hand side of the showDate definition matches a Date (built with the Date constructor) and binds the variables y, m and d to the contents of the Date value.

As for lists, they are not different from data-defined algebraic data types as far as pattern matching is concerned. In effect, it's like lists were defined with this data declaration (note that the following isn't actually valid syntax: lists are in reality deeply ingrained into Haskell):

data [a] = [] | a : [a]

So the empty list, [], and the (:) function, are in reality constructors of the list datatype, so you can pattern match with them. [] takes no arguments, and therefore no variables are bound when it is used for pattern matching. (:) takes two arguments, the list head and tail, which can then be bound to variables when the pattern is recognized.

Furthermore, since [x, y, z] is just syntactic sugar for x:y:z:[], there is a pattern matching solution for the dropThree problem above:

dropThree (_:_:_:xs) = xs

[edit] Introduction to records

For constructors with many elements, records provide useful syntactical assistance. Briefly, they are a way of naming values in a datatype, using the following syntax:

data Foo2 = Bar2 | Baz2 {barNumber::Int, barName::String}

Using records allows doing matching and binding only for the variables relevant to the function we're writing, making code much clearer:

h :: Foo2 -> Int
h Baz2 {barName=name} = length name
h Bar2 {} = 0

Also, the {} pattern can be used for matching a constructor regardless of the datatype elements even if you don't use records in the data declaration:

data Foo = Bar | Baz Int
g :: Foo -> Bool
g Bar {} = True
g Baz {} = False

The function g does not have to be changed if we modify the number or the type of elements of the constructors Bar or Baz.

The record syntax will be covered with more detail later in the book.

[edit] As-patterns

Sometimes, when matching a pattern with a value, it may be useful to bind a name also to the whole value being matched. As-patterns allow exactly this: they are of the form var@pattern and have the additional effect to bind the name var to the whole value being matched by pattern.

For instance, the following code snippet:

other_map f [] = []
other_map f list@(x:xs) = f x list : other_map f xs

creates a variant of map, called other_map, which passes to the parameter function f also the original list. Writing it without as-patterns would have been more cumbersome, because you must recreate the original value, i.e. x:xs:

other_map f [] = []
other_map f (x:xs) = f x (x:xs) : other_map f xs

The difference would be more notable with a more complex pattern.

[edit] Literal values

A simple piece-wise function definition like this one

f :: Int -> Int
f 0 = 1
f 1 = 5
f 2 = 2
f _ = -1

is performing pattern matching as well, matching the argument of f with the Int literals 0, 1 and 2. In general, numeric and character literals can be used in pattern matching. They can also be used together with constructors. For instance, this function

g :: [Int] -> Bool
g (0:[]) = False
g (0:xs) = True
g _ = False

will evaluate to False for the [0] list, to True if the list has 0 as first element and a non-empty tail and to False in all other cases. Also, lists with literal elements like [1,2,3], or even "abc" (which is equivalent to ['a','b','c']) can be used for pattern matching as well, since these forms are only syntactic sugar for the (:) constructor.

It costs nothing to emphasize that the above considerations are only valid for literal values, so the following will not work:

k = 1
--again, this won't work as expected
h :: Int -> Bool
h k = True
h _ = False
Exercises
Test the flawed h function above in GHCi, with arguments equal to and different from 1. Then, try to explain what goes wrong. (Hint: re-read the detailed description of the pattern matching used in the map definition at the start of the module.)

[edit] n+k patterns

There is one exception to the rule that you can only pattern match with constructor functions. It is known as n+k patterns, which make it valid Haskell 98 to write something like:

pred :: Int -> Int
pred (n+1) = n

However, this exception is generally considered unsightly and has now been removed in the Haskell 2010 standard.

[edit] Where you can use it

The short answer is that wherever you can bind variables, you can pattern match. Let's have a look at that more precisely.

[edit] Equations

The first place is in the left-hand side of function definition equations, which were the subject of our examples so far.

map _ []     = []
map f (x:xs) = f x : map f xs

In the map definition we're binding, and doing pattern-matching, on the left hand side of both of these equations.

[edit] Let expressions

Let expressions are a way of doing local variable bindings. As such, you can also use pattern matching in them. A simple example:

y =
  let 
    (x:xs) = map (2*) [1,2,3]
  in x + 5

Here, x will be bound to the first element of map (2*) [1,2,3]. y, therefore, will evaluate to 2 + 5 = 7.

[edit] Case expressions

Another typical usage of pattern binding is on the left hand side of case branches:

describeList :: (Show a) => [a] -> String
describeList list = 
  case list of
   []     -> "The list was empty"
   (x:xs) -> "The list wasn't empty: the first element was " ++ show x ++ ", and " ++
             "there were " ++ show (length xs) ++ " more elements in the list."

[edit] List comprehensions

After the | in list comprehensions you can pattern match. This is actually extremely useful, and adds a lot to the expressiveness of comprehensions. Let's see how that works with a slightly more sophisticated example. Prelude provides a Maybe type which has the following constructors:

data Maybe a = Nothing | Just a

It is typically used to hold values resulting from an operation which may or may not succeed, such as a lookup in a list (if the operation succeeds, the Just constructor is used and the value is passed to it; otherwise Nothing is used). The utility function catMaybes (which is available from Data.Maybe library module) takes a list of Maybes (which may contain both "Just" and "Nothing" Maybes), and retrieves the contained values by filtering all the Just x and getting rid of the Nothing wrappers. Writing it with list comprehensions is very straightforward:

catMaybes :: [Maybe a] -> [a]
catMaybes ms = [ x | Just x <- ms ]

Another nice thing about using a list comprehension for this task is that if the pattern match fails (that is, it meets a Nothing) it just moves on to the next element in ms,[2] thus avoiding the need of explicitly handling each constructor with alternate function definitions.

[edit] Other places

There are a few other situations in which patterns can be used that we'll meet later on. Here's a list in case you're very eager already:

  • where clauses, which are an alternative to let bindings;
  • lambdas, which are anonymous function definitions;
  • In p <- x assignments in do-blocks, p can be a pattern.
  • Similarly, with let bindings in do-blocks, you can pattern match analogously to 'real' let bindings.

[edit] Notes

  1. If you accidentally came here looking for regex pattern matching, you might be interested in looking at the Haskell Text.Regex library wrapper.
  2. Advanced note for the curious: More formally, as list comprehensions are just the list monad, a failed pattern match invokes fail, which is the empty list in this case, and so gets ignored.
Personal tools
Namespaces
Variants
Actions
Navigation
Community
Toolbox
Sister projects
Print/export