Haskell/More on datatypes

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Enumerations[edit | edit source]

One special case of the data declaration is the enumeration — a data type where none of the constructor functions have any arguments:

data Month = January | February | March | April | May | June | July
           | August | September | October | November | December

You can mix constructors that do and do not have arguments, but then the result is not called an enumeration. The following example is not an enumeration because the last constructor takes three arguments:

data Colour = Black | Red | Green | Blue | Cyan
            | Yellow | Magenta | White | RGB Int Int Int

As you will see further on when we discuss classes and derivation, there are practical reasons to distinguish between what is and isn't an enumeration.

Incidentally, the Bool datatype is an enumeration:

data Bool = False | True
    deriving (Bounded, Enum, Eq, Ord, Read, Show)

Named Fields (Record Syntax)[edit | edit source]

Consider a datatype whose purpose is to hold configuration settings. Usually, when you extract members from this type, you really only care about one or two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One way to clarify is to write accessor functions. Consider the following made-up configuration type for a terminal program:

data Configuration = Configuration
    String   -- User name
    String   -- Local host
    String   -- Remote host
    Bool     -- Is guest?
    Bool     -- Is superuser?
    String   -- Current directory
    String   -- Home directory
    Integer  -- Time connected
  deriving (Eq, Show)

You could then write accessor functions, such as:

getUserName (Configuration un _ _ _ _ _ _ _) = un
getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh
getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh
getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig
-- And so on...

You could also write update functions to update a single element. Of course, if you add or remove an element in the configuration later, all of these functions now have to take a different number of arguments. This is quite annoying and is an easy place for bugs to slip in. Thankfully, there's a solution: we simply give names to the fields in the datatype declaration, as follows:

data Configuration = Configuration
    { username      :: String
    , localHost     :: String
    , remoteHost    :: String
    , isGuest       :: Bool
    , isSuperuser   :: Bool
    , currentDir    :: String
    , homeDir       :: String
    , timeConnected :: Integer
    }

This will automatically generate the following accessor functions for us:

username :: Configuration -> String
localHost :: Configuration -> String
-- etc.

This also gives us a convenient update method. Here is a short example for a "post working directory" and "change directory" functions that work on Configurations:

changeDir :: Configuration -> String -> Configuration
changeDir cfg newDir =
    if directoryExists newDir -- make sure the directory exists
        then cfg { currentDir = newDir }
        else error "Directory does not exist"

postWorkingDir :: Configuration -> String
postWorkingDir cfg = currentDir cfg

So, in general, to update the field x in a value y to z, you write y { x = z }. You can change more than one; each should be separated by commas, for instance, y {x = z, a = b, c = d }.

Note

Those of you familiar with object-oriented languages might be tempted, after all of this talk about "accessor functions" and "update methods", to think of the y{x=z} construct as a setter method, which modifies the value of x in a pre-existing y. It is not like that – remember that in Haskell variables are immutable. Therefore, using the example above, if you do something like conf2 = changeDir conf1 "/opt/foo/bar" conf2 will be defined as a Configuration which is just like conf1 except for having "/opt/foo/bar" as its currentDir, but conf1 will remain unchanged.


It's only sugar[edit | edit source]

You can, of course, continue to pattern match against Configurations as you did before. The named fields are simply syntactic sugar; you can still write something like:

getUserName (Configuration un _ _ _ _ _ _ _) = un

But there is no need to do this.

Finally, you can pattern match against named fields as in:

getHostData (Configuration { localHost = lh, remoteHost = rh }) = (lh, rh)

This matches the variable lh against the localHost field in the Configuration and the variable rh against the remoteHost field. These matches will succeed, of course. You could also constrain the matches by putting values instead of variable names in these positions, as you would for standard datatypes.

If you are using GHC, then, with the language extension NamedFieldPuns, it is also possible to use this form:

getHostData (Configuration { localHost, remoteHost }) = (localHost, remoteHost)

It can be mixed with the normal form like this:

getHostData (Configuration { localHost, remoteHost = rh }) = (localHost, rh)

(To use this language extension, enter :set -XNamedFieldPuns in the interpreter, or use the {-# LANGUAGE NamedFieldPuns #-} pragma at the beginning of a source file, or pass the -XNamedFieldPuns command-line flag to the compiler.)

You can create values of Configuration in the old way as shown in the first definition below, or in the named field's type, as shown in the second definition:

initCFG = Configuration "nobody" "nowhere" "nowhere" False False "/" "/" 0

initCFG' = Configuration
    { username      = "nobody"
    , localHost     = "nowhere"
    , remoteHost    = "nowhere"
    , isguest       = False
    , issuperuser   = False
    , currentdir    = "/"
    , homedir       = "/"
    , timeConnected = 0
    }

The first way is much shorter, but the second is much clearer.

WARNING: The second style will allow you to write code that omits fields but will still compile, such as:

cfgFoo = Configuration { username = "Foo" }
cfgBar = Configuration { localHost = "Bar", remoteHost = "Baz" }
cfgUndef = Configuration {}

Trying to evaluate the unspecified fields will then result in a runtime error!

Parameterized Types[edit | edit source]

Parameterized types are similar to "generic" or "template" types in other languages. A parameterized type takes one or more type parameters. For example, the Standard Prelude type Maybe is defined as follows:

data Maybe a = Nothing | Just a

This says that the type Maybe takes a type parameter a. You can use this to declare, for example:

lookupBirthday :: [Anniversary] -> String -> Maybe Anniversary

The lookupBirthday function takes a list of birthday records and a string and returns a Maybe Anniversary. The usual interpretation of such a type is that if the name given through the string is found in the list of anniversaries the result will be Just the corresponding record; otherwise, it will be Nothing. Maybe is the simplest and most common way of indicating failure in Haskell. It is also sometimes seen in the types of function arguments, as a way to make them optional (the intent being that passing Nothing amounts to omitting the argument).

You can parameterize type and newtype declarations in exactly the same way. Furthermore you can combine parameterized types in arbitrary ways to construct new types.

More than one type parameter[edit | edit source]

We can also have more than one type parameter. An example of this is the Either type:

data Either a b = Left a | Right b

For example:

pairOff :: Int -> Either String Int
pairOff people
    | people < 0  = Left "Can't pair off negative number of people."
    | people > 30 = Left "Too many people for this activity."
    | even people = Right (people `div` 2)
    | otherwise   = Left "Can't pair off an odd number of people."

groupPeople :: Int -> String
groupPeople people =
    case pairOff people of
        Right groups -> "We have " ++ show groups ++ " group(s)."
        Left problem -> "Problem! " ++ problem

In this example pairOff indicates how many groups you would have if you paired off a certain number of people for your activity. It can also let you know if you have too many people for your activity or if somebody will be left out. So pairOff will return either an Int representing the number of groups you will have, or a String describing the reason why you can't create your groups.

Kind Errors[edit | edit source]

The flexibility of Haskell parameterized types can lead to errors in type declarations that are somewhat like type errors, except that they occur in the type declarations rather than in the program proper. Errors in these "types of types" are known as "kind" errors. You don't program with kinds: the compiler infers them for itself. But if you get parameterized types wrong then the compiler will report a kind error.