Haskell/Type basics

From Wikibooks, open books for an open world
Jump to: navigation, search

In programming, Types are used to group similar values into categories. In Haskell, the type system is a powerful way of reducing the number of mistakes in your code.

Introduction[edit]

Programming deals with different sorts of entities. For example, consider adding two numbers together:

2 + 3

What are 2 and 3? Well, they are numbers. What about the plus sign in the middle? That's certainly not a number, but it stands for an operation which we can do with two numbers – namely, addition.

Similarly, consider a program that asks you for your name and then greets you with a "Hello" message. Neither your name nor the word Hello are numbers. What are they then? We might refer to all words and sentences and so forth as text. It's normal in programming to use a slightly more esoteric word: String, which is short for "string of characters".

In Haskell, the rule is that all type names have to begin with a capital letter. We shall adhere to this convention henceforth.

Databases illustrate clearly the concept of types. For example, say we had a table in a database to store details about a person's contacts; a kind of personal telephone book. The contents might look like this:

First Name Last Name Telephone number Address
Sherlock Holmes 743756 221B Baker Street London
Bob Jones 655523 99 Long Road Street Villestown

The fields in each entry contain values. Sherlock is a value as is 99 Long Road Street Villestown as well as 655523. As we've said, types are a way of categorizing data, so let's classify the values in this example. The first three fields seem straightforward enough. "First Name" and "Last Name" contain text, so we say that the values are of type String, while "Telephone Number" is clearly a number.

At first glance one may be tempted to classify address as a String. However, the semantics behind an innocent address are quite complex. There are a whole lot of human conventions that dictate how we interpret it. For example, if the beginning of the address text contains a number it is likely the number of the house. If not, then it's probably the name of the house – except if it starts with "PO Box", in which case it's just a postal box address and doesn't indicate where the person lives at all.

Clearly, there's more going on here than just text, as each part of the address has its own meaning. In principle, there is nothing wrong with saying addresses are Strings, but that doesn't capture many important features of addresses. When we describe something as a String, all that we are saying is that it is a sequence of letters, numbers, etc. Recognizing something as a specialized type is far more meaningful. If we know something is an Address, we instantly know much more about the piece of data – for instance, that we can interpret it using the "human conventions" that give meaning to addresses.

In retrospect, we might also apply this rationale to the telephone numbers. It could be a good idea to speak in terms of a TelephoneNumber type. Then, if we were to come across some arbitrary sequence of digits which happened to be of type TelephoneNumber we would have access to a lot more information than if it were just a Number – for instance, we could start looking for things such as area and country codes on the initial digits.

Another reason not to consider the telephone numbers as just Numbers is that doing arithmetics with them makes no sense. What is the meaning and expected effect of, say, multiplying a TelephoneNumber by 100? It would not allow calling anyone by phone. That's a good reason for using a more specialized type than Number. Also, each digit comprising a telephone number is important; it's not acceptable to lose some of them by rounding it or even by omitting leading zeroes.

Why types are useful[edit]

So far, it seems that all what we've done was to describe and categorize things, and it may not be obvious why all of this talk would be so important for writing actual programs. Starting with this module, we will explore how Haskell uses types to the programmer's benefit, allowing us to incorporate the semantics behind, say, an address or a telephone number seamlessly in the code.

Using the interactive :type command[edit]

The best way to explore how types work in Haskell is from GHCi. The type of any expression can be checked with the immensely useful :type (or shortened to :t) command. Let us test it on the boolean values from the previous module:

Example: Exploring the types of boolean values in GHCi

Prelude> :type True
True :: Bool
Prelude> :type False
False :: Bool
Prelude> :t (3 < 5)
(3 < 5) :: Bool

Usage of :type is straightforward: enter the command into the prompt followed by whatever you want to find the type of. On the third example, we use :t, which we will be using from now on. GHCi will then print the type of the expression. The symbol ::, which will appear in a couple other places, can be read as simply "is of type", and indicates a type signature.

:type reveals that truth values in Haskell are of type Bool, as illustrated above for the two possible values, True and False, as well as for a sample expression that will evaluate to one of them. It is worthy to note at this point that boolean values are not just for value comparisons. Bool captures in a very simple way the semantics of a yes/no answer, and so it can be useful to represent any information of such kind – say, whether a name was found in a spreadsheet, or whether a user has toggled an on/off option.

Characters and strings[edit]

Now let's try :t on something new. Literal characters are entered by enclosing them with single quotation marks. For instance, this is the single letter H:

Example: Using the :type command in GHCi on a literal character

Prelude> :t 'H'
'H' :: Char

So, literal character values have type Char (short for "character"). Single quotation marks, however, only work for individual characters. If we need to enter actual text – that is, a string of characters – we use double quotation marks instead:

Example: Using the :t command in GHCi on a literal string

Prelude> :t "Hello World"
"Hello World" :: [Char]

Why did we get Char again? The difference is in the square brackets. [Char] means a number of characters chained together, forming a list. That is what text strings are in Haskell – lists of characters.[1]

Exercises
  1. Try using :type on the literal value "H" (notice the double quotes). What happens? Why?
  2. Try using :type on the literal value 'Hello World' (notice the single quotes). What happens? Why?

Incidentally, Haskell allows for type synonyms, which work pretty much like synonyms in human languages (words that mean the same thing – say, 'big' and 'large'). In Haskell, type synonyms are alternative names for types. For instance, String is defined as a synonym of [Char], and so we can freely substitute one with the other. Therefore, to say:

"Hello World" :: String

is also perfectly valid, and in many cases a lot more readable. From here on we'll mostly refer to text values as String, rather than [Char].

Functional types[edit]

So far, we have seen how values (strings, booleans, characters, etc.) have types and how these types help us to categorize and describe them. Now, the big twist that makes Haskell's type system truly powerful: Functions have types as well.[2] Let's look at some examples to see how that works.

Example: not[edit]

We can negate boolean values with not (e.g. not True evaluates to False and vice-versa). To figure out the type of a function, we consider two things: the type of values it takes as its input and the type of value it returns. In this example, things are easy. not takes a Bool (the Bool to be negated), and returns a Bool (the negated Bool). The notation for writing that down is:

Example: Type signature for not

not :: Bool -> Bool

You can read this as "not is a function from things of type Bool to things of type Bool".

Using :t on a function will work just as expected:

Prelude> :t not
not :: Bool -> Bool

The description of a function's type is in terms of the types of argument(s) it takes and gives.

Example: chr and ord[edit]

Text presents a problem to computers. Once everything is reduced to its lowest level, all a computer knows how to deal with are 1s and 0s: computers work in binary. As working with binary numbers isn't at all convenient, humans have come up with ways of making computers store text. Every character is first converted to a number, then that number is converted to binary and stored. That's how a piece of text (which is just a sequence of characters) is encoded into binary. Normally, we're only interested in how to encode characters into their numerical representations, because the computer generally takes care of the conversion to binary numbers without our intervention.

The easiest way of converting characters to numbers is simply to write all the possible characters down, then number them. For example, we might decide that 'a' corresponds to 1, then 'b' to 2, and so on. This is exactly what something called the ASCII standard is: take 128 commonly-used characters and number them. Of course, it would be a bore to sit down and look up a character in a big lookup table every time we wanted to encode it, so we've got two functions that do it for us, chr (pronounced 'char') and ord[3]:

Example: Type signatures for chr and ord

chr :: Int  -> Char
ord :: Char -> Int

We already know what Char means. The new type on the signatures above, Int, amounts to integer numbers, and is one of quite a few different types of numbers.[4] The type signature of chr tells us that it takes an argument of type Int, an integer number, and evaluates to a result of type Char. The converse is the case with ord: It takes things of type Char and returns things of type Int. With the info from the type signatures, it becomes immediately clear which of the functions encodes a character into a numeric code (ord) and which does the decoding back to a character (chr).

To make things more concrete, here are a few examples of function calls to chr and ord. Notice that the two functions aren't available by default; so before trying them in GHCi you need to use the :module Data.Char (or :m Data.Char) command to load the Data.Char module, where they are defined.

Example: Function calls to chr and ord

Prelude> :m Data.Char
Prelude Data.Char> chr 97
'a'
Prelude Data.Char> chr 98
'b'
Prelude Data.Char> ord 'c'
99

Functions with more than one argument[edit]

The style of type signatures we have been using works well enough for functions of one argument, but what would be the type of a function like this one?

Example: A function with more than one argument

xor p q = (p || q) && not (p && q)

(xor is the exclusive-or function, which evaluates to True if either one or the other argument is True, but not both; and False otherwise.)

The general technique for forming the type of a function that accepts more than one argument is simply to write down all the types of the arguments in a row, in order (so in this case p first then q), then link them all with ->. Finally, add the type of the result to the end of the row and stick a final -> in just before it.[5] In this example, we have:


  1. Write down the types of the arguments. In this case, the use of (||) and (&&) gives away that p and q have to be of type Bool:
    Bool                   Bool
    ^^ p is a Bool         ^^ q is a Bool as well
    
  2. Fill in the gaps with ->:
    Bool -> Bool
    
  3. Add in the result type and a final ->. In our case, we're just doing some basic boolean operations so the result remains a Bool.
    Bool -> Bool -> Bool
                     ^^ We're returning a Bool
                 ^^ This is the extra -> that got added in 
    

The final signature, then, is:

Example: The signature of xor

xor :: Bool -> Bool -> Bool

Real world example: openWindow[edit]

A library is a collection of common code used by many programs.

As you'll learn in the Haskell in Practice section of the course, one popular group of Haskell libraries are the GUI (Graphical User Interface) ones. These provide functions for dealing with the visual things computer users are familiar with: menus and buttons, application windows, moving the mouse around, etc. One of the functions from one of these libraries is called openWindow, and you can use it to open a new window in your application. For example, say you're writing a word processor, and the user has clicked on the 'Options' button. You need to open a new window which contains all the options that they can change. Let's look at the type signature for this function[6]:

Example: openWindow

openWindow :: WindowTitle -> WindowSize -> Window

Don't panic! Here are a few more types you haven't come across yet. But don't worry, they're quite simple. All three of the types there, WindowTitle, WindowSize and Window are defined by the GUI library that provides openWindow. As we saw when constructing the types above, because there are two arrows, the first two types are the types of the parameters, and the last is the type of the result. WindowTitle holds the title of the window (which typically appears in a title bar at the very top of the window), and WindowSize specifies how big the window should be. The function then returns a value of type Window which represents the actual window.

One key point illustrated by this example, as well as the chr/ord one is that, even if you have never seen the function or don't know how it actually works, a type signature can immediately give you a good general idea of what the function is supposed to do. For that reason, a very useful habit to acquire is testing every new function you meet with :t. If you start doing that now, you'll not only learn about the standard library Haskell functions quite a bit quicker but also develop a useful kind of intuition. Curiosity pays off. :)

Exercises

Finding types for functions is a basic Haskell skill worth mastering. What are the types of the following functions?

  1. The negate function, which takes an Int and returns that Int with its sign swapped. For example, negate 4 = -4, and negate (-2) = 2
  2. The (||) function, pronounced 'or', that takes two Bools and returns a third Bool which is True if either of the arguments were, and False otherwise.
  3. A monthLength function which takes a Bool which is True if we are considering a leap year and False otherwise, and an Int which is the number of a month; and returns another Int which is the number of days in that month.

For any functions hereafter involving numbers, you can just pretend the numbers are Ints.

  1. f x y = not x && y
  2. g x = (2*x - 1)^2

Type signatures in code[edit]

We have explored the basic theory behind types and how they apply to Haskell. Now, we will see how type signatures are used for annotating functions in source files. Let's see what that looks like for xor function from an earlier example:

Example: A function with its signature

xor :: Bool -> Bool -> Bool
xor p q = (p || q) && not (p && q)

That is all we have to do, really. Signatures are placed just before the corresponding functions, for maximum clarity.

The signatures we add in this way serve a dual role. They inform the type of the functions both to human readers of the code and to the compiler/interpreter.

Type inference[edit]

We just said that type signatures tell the interpreter (or compiler) what the function type is. Yet we have been writing perfectly good Haskell code without any signatures so far, and it was accepted by GHC/GHCi. Indeed, it is not mandatory to add type signatures. That doesn't mean, however, that when they are missing Haskell simply forgets about types altogether! Instead, when you don't tell Haskell the types of your functions and variables it figures them out through a process called type inference. In essence, the compiler performs inference by starting with the types of things it knows and then working out the types of the rest of the values. Let's see how that works with a general example.

Example: Simple type inference

-- We're deliberately not providing a type signature for this function
isL c = c == 'l'

isL is a function that takes an argument c and returns the result of evaluating c == 'l'. Without a type signature, the type of c and the type of the result are not specified. In the expression c == 'l', however, the compiler knows that 'l' is a Char. Since c and 'l' are being compared with equality with (==) and both arguments of (==) must have the same type,[7] it follows that c must be a Char. Finally, since isL c is the result of (==) it must be a Bool. And thus we have a signature for the function:

Example: isL with a type

isL :: Char -> Bool
isL c = c == 'l'

Indeed, if you leave out the type signature, the Haskell compiler will discover it through this process. You can verify that by using :t on isL with or without a signature.

So, if type signatures are optional in most cases,[8] why should we care so much about them? Here are a few reasons:

  • Documentation: type signatures make your code easier to read. With most functions, the name of the function along with the type of the function is sufficient to guess what the function does. Of course, you should always comment your code properly too, but having the types clearly stated helps a lot, too.
  • Debugging: when you annotate a function with a type signature and then make a typo in the body of the function which changes the type of a variable, the compiler will tell you, at compile-time, that your function is wrong. Leaving off the type signature might allow your erroneous function to compile, and the compiler would assign it the wrong type. You wouldn't know until you ran your program that you made this mistake.

Types and readability[edit]

To understand better how signatures can help documentation, here is a somewhat more realistic example. The piece of code quoted below is a tiny module (modules are the typical way of preparing a library), and this way of organizing code is not too different from what you might find when reading source code for the libraries bundled with GHC.

Note


Do not go crazy trying to understand how the functions here actually work; that is beside the point as we still have not covered many of the features being used. Just keep reading and play along.

Example: Module with type signatures

module StringManip where
 
import Data.Char
 
uppercase, lowercase :: String -> String
uppercase = map toUpper
lowercase = map toLower
 
capitalize :: String -> String
capitalize x =
  let capWord []     = []
      capWord (x:xs) = toUpper x : xs
  in unwords (map capWord (words x))

This tiny library provides three string manipulation functions. uppercase converts a string to upper case, lowercase to lower case, and capitalize capitalizes the first letter of every word. Each of these functions takes a String as argument and evaluates to another String. What is relevant to our discussion here is that, even if we do not understand how these functions work, looking at the type signatures allows us to immediately know the types of the arguments and return values. That information, when paired with sensible function names, can make it a lot easier to figure out how we can use the functions.

Note that when functions have the same type we have the option of writing just one signature for all of them, by separating their names with commas, as it was done with uppercase and lowercase.

Types prevent errors[edit]

The role of types in preventing errors is central to typed languages. When passing expressions around you have to make sure the types match up like they did here. If they don't, you'll get type errors when you try to compile; your program won't pass the typecheck. That is how types help you to keep your programs bug-free. To take a very trivial example:

Example: A non-typechecking program

"hello" + " world"

Having that line as part of your program will make it fail to compile, because you can't add two strings together! In all likelihood the intention was to use the similar-looking string concatenation operator, which joins two strings together into a single one:

Example: Our erroneous program, fixed

"hello" ++ " world"

An easy typo to make, but because you use Haskell, it was caught when you tried to compile. You didn't have to wait until you ran the program for the bug to become apparent.

That was only a simple example, but the idea of types forming a system to catch mistakes works on a much larger scale too. In general, when you make a change to your program, you'll change the type of one of the elements. If this change isn't something that you intended, or has unforeseen consequences, then it will show up immediately. A lot of Haskell programmers remark that once they have fixed all the type errors in their programs, and their programs compile, that they tend to "just work": run the first time with only minor problems. Run-time errors, where your program goes wrong when you run it rather than when you compile it, are much rarer in Haskell than in other languages. This is a huge advantage of having a strong type system like Haskell does.


Notes[edit]

  1. Lists, be they of characters or of other things, are very important entities in Haskell, and we will cover them in more detail in a little while.
  2. The deeper truth is that functions are values, just like all the others.
  3. This isn't quite what chr and ord do, but that description fits our purposes well, and it's close enough.
  4. In fact, it is not even the only type for integers! We will meet its relatives in a short while.
  5. This method might seem just a trivial hack by now, but actually there are very deep reasons behind it, which we'll cover in the chapter on Currying.
  6. This has been somewhat simplified to fit our purposes. Don't worry, the essence of the function is there.
  7. As we discussed in "Truth values". That fact is actually stated by the type signature of (==) – if you are curious you can check it, although you will have to wait a little bit more for a full explanation of the notation used in it.
  8. There are a few situations in which the compiler lacks information to infer the type, and so the signature becomes obligatory; and, in some other cases, we can influence to a certain extent the final type of a function or value with a signature. That needn't concern us for the moment, however.