User:Duplode/Haskell leftovers

Old VarFun section on where

So far we have seen two sorts of r in the examples: a variable, which stores some value of our choice, and a parameter of a function (say, area r). But what would happen if we use both at the same time? Say I type something in like this (don't type it in yet):

Prelude> let r = 0
Prelude> let area r = pi * r ^ 2
Prelude> area 5

What do you think that should happen here? Will area work as expected or are we in for an unpleasant surprise? Think about an answer, then test it, see if it was what you expected and then follow on with the text...

(...)

Scope and parameters

Warning: this section contains spoilers to the previous "exercise"

Fortunately, the following fragment of code does not contain any unpleasant surprises:

Prelude> let r = 0
Prelude> let area r = pi * r ^ 2
Prelude> area 5
78.53981633974483

An unpleasant surprise here would have been getting the value 0 because of the let r = 0 definition getting in the way. That does not happen because when you defined r the second time is that you are talking about a different r. This is something that happens in real life as well. How many people do you know that have the name John? What's interesting about people named John is that most of the time, you can talk about "John" to your friends, and depending on the context, your friends will know which John you are referring to. Programming has something similar to context, called scope. We won't explain the technicalities behind scope (at least not now), but Haskell's lexical scope is the magic that lets us define two different r and always get the right one back depending on context.

Thanks to scope, the value of a parameter is strictly what you pass in when you call the function. Informally, we could say the r in let r = 0 is not the same r as the one inside our defined function area - the r inside area overrides the other r; you can think of it as Haskell picking the most specific version of r there is. If you have many friends all named John, you go with the one which just makes more sense and is specific to the context; similarly, what value of r we get depends on the scope.

where clauses

Readers: don't worry about this missing section. As of now where clauses are explained later on, and there is no need to rush over there to check them. This is just a placeholder for when they are moved to this section in the near future.

Old VarFun/Type basics introduction to types

Let us have another look at the circumference examples in Variables and functions. While playing around with the expressions you might be tempted to try storing a plain value such as 5 or 25 and using it in other expressions. Let's see what happens:

Prelude> let r = 25
Prelude> 2 * pi * r

<interactive>:1:9:
    Couldn't match `Double' against `Integer'
      Expected type: Double
      Inferred type: Integer
    In the second argument of `(*)', namely `r'
    In the definition of `it': it = 2 * pi * r

Whoops! You've just run into a programming concept known as types. Types are a feature of many programming languages which are designed to catch some of your programming errors early on so that you find out about them before it's too late. One simple analogy for us to start with are plugs and connectors. For example, many of the plugs on the back of your computer are designed to have different shapes and sizes for a purpose. This is partly so that you don't inadvertently plug the wrong bits of your computer together and blow something up. Types serve a similar purpose, but in this particular example, well, types aren't so helpful.

The tricky bit here is that numbers like 25 can be interpreted as being Integer or Double (Double is one of several types for representing real numbers). For lack of other information, Haskell has "guessed" that its type must be Integer (which cannot be multiplied with a Double). To work around this, we simply insist that it is to be treated as a Double

Prelude> let r = 25 :: Double
Prelude> 2 * pi * r
157.07963267948966

Note that Haskell only has this "guessing" behaviour in contexts where it does not have enough information to infer the type of something. As we will see below, most of the time, the surrounding context gives us all of the information that is needed to determine, say, if a number is to be treated as an Integer or not.

Note

There is actually a little bit more subtlety behind this problem. It involves a language feature known as the monomorphism restriction. You don't actually need to know about this for now, so you can skip over this note if you just want to keep a brisk pace. Instead of specifying the type Double, you also could have given it a polymorphic type, like Num a => a, which means "any type a which belongs in the class Num". The corresponding code looks like this and works just as seamlessly as before:

Prelude> let r :: Num a => a ; r = 25
Prelude> 2 * pi * r
157.07963267948966

Haskell could in theory assign such polymorphic types systematically, instead of defaulting to some potentially incorrect guess, like Integer. But in the real world, this could lead to values being needlessly duplicated or recomputed. To avoid this potential trap, the designers of the Haskell language opted for a more prudent "monomorphism restriction". It means that values may only have a polymorphic type if it can be inferred from the context, or if you explicitly give it one. Otherwise, the compiler is forced to choose a default monomorphic (i.e. non-polymorphic) type. This feature is somewhat controversial. It can even be disabled with the GHC flag (-fno-monomorphism-restriction), but it comes with some risk for inefficiency. Besides, in most cases, it is just as easy to specify the type explicitly.

Reworked sections of Type basics

Boolean values

One of the other types found in most languages is called a Boolean, or Bool for short. This has two values: true and false. This turns out to be very useful. For example, consider a program that would ask the user for a name then look that name up in a spreadsheet. It might be useful to have a function, nameExists, which indicates whether or not the name of the user exists in the spreadsheet. If it does exist, you could say that it is true that the name exists, and if not, you could say that it is false that the name exists. So we've come across Bools. The two values of bools are, as we've mentioned, true and false. In Haskell, boolean values are capitalized (for reasons that will later become clear):

Example: Exploring the types of True and False in GHCi

Prelude> :t True
True :: Bool
Prelude> :t False
False :: Bool

This shouldn't need too much explaining at this point. The values True and False are categorized as Booleans, that is to say, they have type Bool.

Numeric types

If you've been playing around with typing :t on all the familiar values you've come across, perhaps you've run into the following complication:

Prelude> :t 5
5 :: (Num t) => t

We'll defer the full explanation of this until later. The short version of the story is that there are many different types of numbers (fractions, whole numbers, etc) and 5 can be any one of them. This weird-looking type relates to a Haskell feature called type classes, which we will be playing with later in this book.

Type signatures in code

As of this revision of Type Basics.

Readers: As of now there are a few forward references in this section. We are in the process of sorting out some module sequencing issues, which means that problem will be solved soon. In the meantime, you might prefer to go through the next module, "Lists and tuples" and then return here to finish this one.

Now we've explored the basic theory behind types as well as how they apply to Haskell, let's look at how they appear in actual code. The first thing to point out is that it is a good practice to annotate every function with its associated type. For an applied example, let us have a look at a very small module (modules in Haskell are just a neat way of making a library by packing functions for usage in other programs. No need to worry with details now).

Example: Module without type signatures

module StringManip where

import Data.Char

uppercase = map toUpper
lowercase = map toLower
capitalise x = 
  let capWord []     = []
      capWord (x:xs) = toUpper x : xs
  in unwords (map capWord (words x))

This tiny library provides two handy string manipulation functions. uppercase converts a string to uppercase, lowercase to lowercase, and capitalize capitalizes the first letter of every word. Each of these functions takes a String as argument and evaluates to another String. For now, don't worry with how the code actually works, as it is obviously full of features we haven't discussed yet. Anyway - and specially because we don't understand the implementation yet - stating the type for these functions explicitly would make their roles more obvious. For example, most Haskellers would write the above module something like the following:

Example: Module with type signatures

module StringManip where

import Data.Char

uppercase, lowercase :: String -> String
uppercase = map toUpper
lowercase = map toLower

capitalise :: String -> String
capitalise x = 
  let capWord []     = []
      capWord (x:xs) = toUpper x : xs
  in unwords (map capWord (words x))

The added signatures inform the type of the functions to both the interpreter/compiler and the human readers. Note that you can group type signatures together into a single type signature (like ours for uppercase and lowercase above) if the two functions share the same type.

Type inference

We just said that type signatures tell the interpreter (or compiler) what the function type is. However, in the modules before this one you wrote perfectly good Haskell code without any signatures, and it has been accepted by GHC/GHCi. That means it's not strictly necessary to add type signatures. But that doesn't mean that if you don't add type signatures Haskell simply forgets about typing altogether! Indeed, when you didn't tell Haskell the types of your functions and variables, it figured them out. This is a process called type inference, in which the compiler starts with the types of things it knows, then works out the types of the rest of the things. Type inference for Haskell is decidable, which means that the compiler can always work out the types, even if you never write them in ^[1]. Let's look at some examples to see how the compiler works out types.

Example: Simple type inference

-- We're deliberately not providing a type signature for this function
isL c = c == 'l'

This function takes a character and sees if it is an 'l' character. The compiler derives the type for isL through a process like the following:

Example: A typing derivation

(==)  :: a -> a -> Bool
'l'   :: Char
Replacing the second ''a'' in the signature for (==) with the type of 'l':
(==)  :: Char -> Char -> Bool
isL   :: Char -> Bool

The first line indicates that the type of the function (==), which tests for equality, is a -> a -> Bool (you can think of the a as a placeholder which stands for any type - (==) will accept any two arguments as long as they are of the same type. That is called a polymorphic function) ^[2]. The compiler also knows that something in single quotes has type Char, so clearly the literal 'l' has type Char. Next, the compiler starts replacing the type variables in the signature for (==) with the types it knows. Note that in one step, we went from a -> a -> Bool to Char -> Char -> Bool, because the type variable a was used in both the first and second argument, so they need to be the same. And so we arrive at a function that takes a single argument (whose type we don't know yet, but hold on!) and applies it as the first argument to (==). We have a particular instance of the polymorphic type of (==), that is, here, we're talking about (==) :: Char -> Char -> Bool because we know that we're comparing Chars. Therefore, as (==) :: Char -> Char -> Bool and we're feeding the parameter into the first argument to (==), we know that the parameter has the type of Char. Phew!

But wait, we're not finished yet... What's the return type of the function? Thankfully, this bit is a bit easier. We've fed two Chars into a function which (in this case) has type Char -> Char -> Bool, so we must have a Bool. Note that the return value from the call to (==) becomes the return value of our isL function.

So, let's put it all together. isL is a function that takes a single argument. We discovered that this argument must be of type Char. Finally, we derived that we return a Bool. So, we can confidently say that isL has the type:

Example: isL with a type

isL :: Char -> Bool
isL c = c == 'l'

And, indeed, if you miss out the type signature, the Haskell compiler will discover this on its own, using exactly the same method we've just run through.

So if type signatures are optional, why bother with them at all? Here are a few reasons:

Documentation: the most prominent reason is that it makes your code easier to read. With most functions, the name of the function along with the type of the function is sufficient to guess at what the function does. Of course, you should always comment your code properly too, but having the types clearly stated helps a lot anyway.
Debugging: if you annotate a function with a type, then make a typo in the body of the function, the compiler will tell you at compile-time that your function is wrong. Leaving off the type signature could have the effect of allowing your function to compile, and the compiler would assign it an erroneous type. You wouldn't know until you ran your program that it was wrong. In fact, since this is such a crucial point we will explore it some more.

Types prevent errors

Imagine you have a few functions set up like the following:

Example: Type inference at work

fiveOrSix :: Bool -> Int
fiveOrSix True  = 5
fiveOrSix False = 6

pairToInt :: (Bool, String) -> Int
pairToInt x = fiveOrSix (fst x)

Our function fiveOrSix takes a Bool. When pairToInt receives its arguments, it knows, because of the type signature we've annotated it with, that the first element of the pair is a Bool. So, we could extract this using fst and pass that into fiveOrSix, and this would work, because the type of the first element of the pair and the type of the argument to fiveOrSix are the same.

This is really central to typed languages. When passing expressions around you have to make sure the types match up like they did here. If they don't, you'll get type errors when you try to compile; your program won't typecheck. This is really how types help you to keep your programs bug-free. To take a very trivial example:

Example: A non-typechecking program

"hello" + " world"

Having that line as part of your program will make it fail to compile, because you can't add two strings together! In all likelihood the intention was to use the similar-looking string concatenation operator, which joins two strings together into a single one:

Example: Our erroneous program, fixed

"hello" ++ " world"

An easy typo to make, but because you use Haskell, it was caught when you tried to compile. You didn't have to wait until you ran the program for the bug to become apparent.

This was only a simple example. However, the idea of types being a system to catch mistakes works on a much larger scale too. In general, when you make a change to your program, you'll change the type of one of the elements. If this change isn't something that you intended, or has unforeseen consequences, then it will show up immediately. A lot of Haskell programmers remark that once they have fixed all the type errors in their programs, and their programs compile, that they tend to "just work": run flawlessly at the first time, with only minor problems. Run-time errors, where your program goes wrong when you run it rather than when you compile it, are much rarer in Haskell than in other languages. This is a huge advantage of having a strong type system like Haskell does.

Exercises

Infer the types of following functions:

f x y = uppercase (x ++ y)
g (x,y) = fiveOrSix (isL x) - ord y
h x y = pairToInt (fst x,y) + snd x + length y

↑ Some of the newer type system extensions to GHC do break this, however, so you're better off just always putting down types anyway.
↑ This is a slight lie. That type signature would mean that you can compare two values of any type whatsoever, but this clearly isn't true: how can you see if two functions are equal? Haskell includes a kind of 'restricted polymorphism' that allows type variables to range over some, but not all types. Haskell implements this using type classes, which we'll learn about later. In this case, the correct type of (==) is Eq a => a -> a -> Bool.

[1] Some of the newer type system extensions to GHC do break this, however, so you're better off just always putting down types anyway.

[2] This is a slight lie. That type signature would mean that you can compare two values of any type whatsoever, but this clearly isn't true: how can you see if two functions are equal? Haskell includes a kind of 'restricted polymorphism' that allows type variables to range over some, but not all types. Haskell implements this using type classes, which we'll learn about later. In this case, the correct type of (==) is Eq a => a -> a -> Bool.

[1]

[2]