Haskell/Next steps

From Wikibooks, open books for an open world
< Haskell
Jump to: navigation, search

This chapter has two main goals. We will dedicate the bulk of the text to introducing pattern matching, which is an absolutely fundamental feature of Haskell. Besides, we will also mention two new pieces of syntax: if expressions and let bindings, which will prove handy in upcoming Simple input and output chapter. In our context, though, they can be safely thought of as minor points; for, even though if and let are convenient sometimes, there is little in what they do that can't be achieved with the syntax we already know. There is plenty of time for you to master them; and furthermore there will be more examples a little ahead in the book.

if / then / else[edit]

Haskell syntax supports garden-variety conditional expressions of the form if... then... (else ...). For instance, consider a function that returns (-1) if its argument is less than 0; 0 if its argument is 0; and 1 if its argument is greater than 0. There is a predefined function which does that job already (it is called signum); for the sake of illustration, though, let's define a version of our own:

Example: The signum function.

mySignum x =
    if x < 0 
        then -1
        else if x > 0
            then 1
            else 0

You can experiment with this as:

*Main> mySignum 5
1
*Main> mySignum 0
0
*Main> mySignum (5-10)
-1
*Main> mySignum (-1)
-1

The parentheses around "-1" in the last example are required; if missing, the system will think you are trying to subtract 1 from mySignum, which is ill-typed.

In an if/then/else construct, first the condition (in this case x < 0) is evaluated. If it results True, the whole construct evaluates to the then expression; otherwise (if the condition is False), the construct evaluates to the else expression. All of that is pretty intuitive. If you have programmed in an imperative language before, however, it might seem surprising to know that we always need to have both a then and an else clause. That must be so because the construct has to result in a value in both cases - and a value of the same type in both cases, by the way.

if / then / else function definitions like the one above can be easily rewritten with the guards syntax presented in past modules:

Example: From if to guards

mySignum x
    | x < 0     = -1
    | x > 0     = 1
    | otherwise = 0

And conversely, the absolute value function in Truth values can be rendered as:

Example: From guards to if

abs x =
    if x < 0 
        then -x
        else x

Introducing pattern matching[edit]

Suppose we are writing a program which tracks statistics from a racing competition in which racers receive points based on their classification in each race, the scoring rules being:

  • 10 points for the winner of the race;
  • 6 for the second-placed;
  • 4 for the third-placed;
  • 3 for the fourth-placed;
  • 2 for the fifth-placed;
  • 1 for the sixth-placed; and
  • no points for other racers.

We can write a simple function which takes a classification (represented by an integer number: 1 for first place, etc.[1]) and returns how many points were earned. One possible solution uses if/then/else:

Example: Computing points with if/then/else

pts :: Int -> Int
pts x =
    if x == 1
        then 10
        else if x == 2
            then 6
            else if x == 3
                then 4
                else if x == 4
                    then 3
                    else if x == 5
                        then 2
                        else if x == 6
                            then 1
                            else 0

Yuck! Admittedly, it wouldn't look this hideous had we used guards instead of if/then/else, but it still would be tedious to write (and read!) all those equality tests. We can do better, though:

Example: Computing points with a piece-wise definition

pts :: Int -> Int
pts 1 = 10
pts 2 = 6
pts 3 = 4
pts 4 = 3
pts 5 = 2
pts 6 = 1
pts _ = 0

Much better. However, even though defining pts in this style (which we will arbitrarily call piece-wise definition from now on) shows to a reader of the code what the function does in an awesomely clear way, the syntax looks odd given what we have seen of Haskell so far. Why are there seven equations for pts? What are those numbers doing in their left-hand sides? Where has the x gone?

The example above is our first encounter with a key feature of Haskell called pattern matching. When we call the second version of pts, the argument is matched against the numbers on the left side of each of the equations, which in turn are the patterns. This matching is done in the order we wrote the equations; so first of all the argument is matched against the 1 in the first equation. If the argument is indeed 1, we have a match and the first equation is used; and so pts 1 evaluates to 10 as expected. Otherwise, the other equations are tried in order following the same procedure. The final one, though, is rather different: the _ is a special pattern that might be read as "whatever": it matches with anything; and therefore if the argument doesn't match any of the previous patterns pts will return zero.

As for why there is no x or any other variable standing for the argument, it is simply because we don't need that to write the definitions. All possible return values are constants; and since the point of specifying a variable name on the left side is using it to write the right side, the x is unnecessary in our function.

There is, however, an obvious way to make pts even more concise. The points given to a racer decrease regularly from third place to sixth place, at a rate of one point per position. After noticing that, we can eliminate three of the seven equations as follows:

Example: Mixing styles

pts :: Int -> Int
pts 1 = 10
pts 2 = 6
pts x
    | x <= 6    = 7 - x
    | otherwise = 0

The first thing to point out here is that we can mix both styles of definitions. In fact, when we write pts x in the left side of an equation we are using pattern matching too! As a pattern, the x (or any other variable name) matches anything just like _; the only difference being that it also gives us a name to use on the right side (which, in this case, is necessary to write 7 - x).

Exercises
We cheated a little when moving from the second version of pts to the third one: they do not do exactly the same thing. Can you spot what the difference is?

Please do the exercise above before continuing: it is quick to do and really important.

Beyond integers, pattern matching works with values of various other types. One handy example are booleans. For instance, the (||) logical-or operator we met in Truth values could be defined as:

Example: (||)

(||) :: Bool -> Bool -> Bool
False || False = False
_     || _     = True

Or:

Example: (||), done another way

(||) :: Bool -> Bool -> Bool
True  || _ = True
False || y = y

When matching two or more arguments at once, the equation will only be used if all of them match.

To conclude this section, let us discuss a few things that might go wrong when using pattern matching:

  • If we put a pattern which matches anything (such as the final patterns in each of the pts example) before the more specific ones the latter will be ignored. GHC(i) will typically warn us that "Pattern match(es) are overlapped" in such cases.
  • If no patterns match, an error will be triggered. Generally, it is a good idea to ensure the patterns cover all cases, in the same way that the otherwise guard is not mandatory but highly recommended.
  • Finally, while you can play around with various ways of (re)defining (&&)[2], here is one version that will not work:
(&&) :: Bool -> Bool -> Bool
x && x = x -- oops!
_ && _ = False
The program won't test whether the arguments are equal just because we happened to use the same name for both. As far as the matching goes, we could just as well have written _ && _ in the first case. And even worse: since we gave the same name to both arguments, GHC(i) will refuse the function due to "Conflicting definitions for `x'".

Tuple and list patterns[edit]

While the examples above show that pattern matching helps in writing more elegant code, that does not explain why it is so important. We will begin to grasp why it matters so much by considering the problem of writing a definition for fst, the function which extracts the first element of a pair. At this point, that appears to be an impossible task, as the only way of accessing the first value of the pair is by using fst itself... The following function, however, does the same thing as fst (confirm it in GHCi):

Example: A definition for fst

fst' :: (a, b) -> a
fst' (x, _) = x

It's magic! Instead of using a regular variable in the left side of the equation, we specified the argument with the pattern of the 2-tuple - that is, (,) - filled with a variable and the _ pattern. Then the variable was automatically associated with the first component of the tuple, and we used it to write the right side of the equation. The definition of snd is, of course, analogous.

Furthermore, the trick demonstrated above can be done with lists as well. Here are the actual definitions of head and tail:

Example: head, tail and patterns

head             :: [a] -> a
head (x:_)       =  x
head []          =  error "Prelude.head: empty list"
 
tail             :: [a] -> [a]
tail (_:xs)      =  xs
tail []          =  error "Prelude.tail: empty list"

The only essential change in relation to the previous example was replacing (,) with the pattern of the cons operator (:). These functions also have an equation using the pattern of the empty list, []; however, since empty lists have no head or tail there is nothing to do other than use error to print a prettier error message.

Summarizing, the real power of pattern matching comes from how it can be used to access the parts of a complex value. Pattern matching on lists, in particular, will be extensively deployed in Recursion and the chapters that follow it. Later on, we will also dedicate a whole chapter to explore what is behind such a seemingly magical feature.

let bindings[edit]

To conclude this chapter, a brief word about let bindings, which are an alternative to where clauses for making local declarations. For instance, take the problem of finding the roots of a polynomial of the form ax^2+bx+c (or, in other words, the solution to a second degree equation - think back of your middle school math courses). Its solutions are given by:

x = \frac {-b \pm \sqrt{b^2-4ac}} {2a}.

We could write the following function to compute the two values of x:

roots a b c =
    ((-b + sqrt(b*b - 4*a*c)) / (2*a),
     (-b - sqrt(b*b - 4*a*c)) / (2*a))

Writing the sqrt(b*b - 4*a*c) term in both cases is annoying, though; we can use a local binding instead, using either where or, as will be demonstrated below, a let declaration:

roots a b c =
    let sdisc = sqrt (b*b - 4*a*c)
    in  ((-b + sdisc) / (2*a),
         (-b - sdisc) / (2*a))

We put the let keyword before the declaration, and then use in to signal we are returning to the "main" body of the function. It is possible to put multiple declarations inside a single let...in block - just make sure they are indented the same amount, otherwise there will be syntax errors:

roots a b c =
    let sdisc = sqrt (b*b - 4*a*c)
        twice_a = 2*a
    in  ((-b + sdisc) / twice_a,
         (-b - sdisc) / twice_a)

Note

Still on indentation, the Indentation chapter has a full account of indentation rules; so if doubts about that arise as you code you might want to give it a glance.


Notes[edit]

  1. Here we will not be much worried about what happens if a nonsensical value (say, (-4)) is passed to the function. In general, however, it is a good idea to give some thought to such "strange" cases.
  2. If you are going to experiment with it in GHCi, call your version something else to avoid a name clash; say, (&!&).