Haskell/More on datatypes

From Wikibooks, open books for an open world
< Haskell
Jump to: navigation, search

Enumerations[edit]

One special case of the data declaration is the enumeration. This is simply a data type where none of the constructor functions have any arguments:

data Month = January | February | March | April | May | June | July
             | August | September | October | November | December

You can mix constructors that do and do not have arguments, but it's only an enumeration if none of the constructors have arguments. For instance:

data Colour = Black | Red | Green | Blue | Cyan
            | Yellow | Magenta | White | RGB Int Int Int

The last constructor takes three arguments, so Colour is not an enumeration. As you will see further on when we discuss classes and derivation, this distinction is not only conceptual.

Incidentally, the definition of the Bool datatype is:

data Bool = False | True
    deriving (Eq, Ord, Enum, Read, Show, Bounded)

Named Fields (Record Syntax)[edit]

Consider a datatype whose purpose is to hold configuration settings. Usually when you extract members from this type, you really only care about one or possibly two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One thing you could do would be to write accessor functions. Consider the following made-up configuration type for a terminal program:

data Configuration =
    Configuration String          -- user name
                  String          -- local host
                  String          -- remote host
                  Bool            -- is guest?
                  Bool            -- is super user?
                  String          -- current directory
                  String          -- home directory
                  Integer         -- time connected
              deriving (Eq, Show)

You could then write accessor functions, like (I've only listed a few):

getUserName (Configuration un _ _ _ _ _ _ _) = un
getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh
getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh
getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig
...

You could also write update functions to update a single element. Of course, now if you add an element to the configuration, or remove one, all of these functions now have to take a different number of arguments. This is highly annoying and is an easy place for bugs to slip in. However, there's a solution. We simply give names to the fields in the datatype declaration, as follows:

data Configuration =
    Configuration { username      :: String,
                    localhost     :: String,
                    remotehost    :: String,
                    isguest       :: Bool,
                    issuperuser   :: Bool,
                    currentdir    :: String,
                    homedir       :: String,
                    timeconnected :: Integer
                  }

This will automatically generate the following accessor functions for us:

username :: Configuration -> String
localhost :: Configuration -> String
...

Moreover, it gives us a convenient update method. Here is a short example for a "post working directory" and "change directory" like functions that work on Configurations:

changeDir :: Configuration -> String -> Configuration
changeDir cfg newDir =
    -- make sure the directory exists
    if directoryExists newDir
      then -- change our current directory
           cfg{currentdir = newDir}
      else error "directory does not exist"

postWorkingDir :: Configuration -> String
  -- retrieve our current directory
postWorkingDir cfg = currentdir cfg

So, in general, to update the field x in a datatype y to z, you write y{x=z}. You can change more than one; each should be separated by commas, for instance, y{x=z, a=b, c=d}.

Note

Those of you familiar with object-oriented languages might be tempted to, after all of this talk about "accessor functions" and "update methods", think of the y{x=z} construct as a setter method, which modifies the value of x in a pre-existing y. It is not like that – remember that in Haskell variables are immutable. Therefore, if, using the example above, you do something like conf2 = changeDir conf1 "/opt/foo/bar" conf2 will be defined as a Configuration which is just like conf1 except for having "/opt/foo/bar" as its currentdir, but conf1 will remain unchanged.


It's only sugar[edit]

You can of course continue to pattern match against Configurations as you did before. The named fields are simply syntactic sugar; you can still write something like:

getUserName (Configuration un _ _ _ _ _ _ _) = un

But there is little reason to. Finally, you can pattern match against named fields as in:

getHostData (Configuration {localhost=lh,remotehost=rh})
  = (lh,rh)

This matches the variable lh against the localhost field on the Configuration and the variable rh against the remotehost field on the Configuration. These matches of course succeed. You could also constrain the matches by putting values instead of variable names in these positions, as you would for standard datatypes.

You can create values of Configuration in the old way as shown in the first definition below, or in the named-field's type, as shown in the second definition below:

initCFG =
    Configuration "nobody" "nowhere" "nowhere"
                  False False "/" "/" 0
initCFG' =
    Configuration
       { username="nobody",
         localhost="nowhere",
         remotehost="nowhere",
         isguest=False,
         issuperuser=False,
         currentdir="/",
         homedir="/",
         timeconnected=0 }

The first way is much shorter, although the second is much clearer.

Parameterized Types[edit]

Parameterized types are similar to "generic" or "template" types in other languages. A parameterized type takes one or more type parameters. For example, the Standard Prelude type Maybe is defined as follows:

data Maybe a = Nothing | Just a

This says that the type Maybe takes a type parameter a. You can use this to declare, for example:

lookupBirthday :: [Anniversary] -> String -> Maybe Anniversary

The lookupBirthday function takes a list of birthday records and a string and returns a Maybe Anniversary. Typically, our interpretation is that if it finds the name then it will return Just the corresponding record, and otherwise, it will return Nothing.

You can parameterize type and newtype declarations in exactly the same way. Furthermore you can combine parameterized types in arbitrary ways to construct new types.

More than one type parameter[edit]

We can also have more than one type parameter. An example of this is the Either type:

data Either a b = Left a | Right b

For example:

pairOff :: Int -> Either Int String
pairOff people
  | people < 0 = Right "Can't pair off negative number of people."
  | people > 30 = Right "Too many people for this activity."
  | even people = Left (people `div` 2)
  | otherwise = Right "Can't pair off an odd number of people."

groupPeople :: Int -> String
groupPeople people = case pairOff people of
                       Left groups -> "We have " ++ show groups ++ " group(s)."
                       Right problem -> "Problem! " ++ problem

In this example pairOff indicates how many groups you would have if you paired off a certain number of people for your activity. It can also let you know if you have too many people for your activity or if somebody will be left out. So pairOff will return either an Int representing the number of groups you will have, or a String describing the reason why you can't create your groups.

Kind Errors[edit]

The flexibility of Haskell parameterized types can lead to errors in type declarations that are somewhat like type errors, except that they occur in the type declarations rather than in the program proper. Errors in these "types of types" are known as "kind" errors. You don't program with kinds: the compiler infers them for itself. But if you get parameterized types wrong then the compiler will report a kind error.