Haskell/GADT

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Generalized algebraic datatypes, or simply GADTs, are a generalization of the algebraic data types that you are familiar with. Basically, they allow you to explicitly write down the types of the constructors. In this chapter, you'll learn why this is useful and how to declare your own.

We begin with an example of building a simple embedded domain specific language (EDSL) for simple arithmetical expressions, which is put on a sounder footing with GADTs. This is followed by a review of the syntax for GADTs, with simpler illustrations, and a different application to construct a safe list type for which the equivalent of head [] fails to typecheck and thus does not give the usual runtime error: *** Exception: Prelude.head: empty list.

Understanding GADTs[edit | edit source]

So, what are GADTs and what are they useful for? GADTs are mainly used to implement domain specific languages, and so this section will introduce them with a corresponding example.

Arithmetic expressions[edit | edit source]

Let's consider a small language for arithmetic expressions, given by the data type

data Expr = I Int         -- integer constants
          | Add Expr Expr -- add two expressions
          | Mul Expr Expr -- multiply two expressions

In other words, this data type corresponds to the abstract syntax tree, an arithmetic term like (5+1)*7 would be represented as (I 5 `Add` I 1) `Mul` I 7 :: Expr.

Given the abstract syntax tree, we would like to do something with it; we want to compile it, optimize it and so on. For starters, let's write an evaluation function that takes an expression and calculates the integer value it represents. The definition is straightforward:

eval :: Expr -> Int
eval (I n)       = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2

Extending the language[edit | edit source]

Now, imagine that we would like to extend our language with other types than just integers. For instance, let's say we want to represent equality tests, so we need booleans as well. We augment the `Expr` type to become

data Expr = I Int
          | B Bool           -- boolean constants
          | Add Expr Expr
          | Mul Expr Expr
          | Eq  Expr Expr    -- equality test

The term 5+1 == 7 would be represented as (I 5 `Add` I 1) `Eq` I 7.

As before, we want to write a function eval to evaluate expressions. But this time, expressions can now represent either integers or booleans and we have to capture that in the return type

eval :: Expr -> Either Int Bool

The first two cases are straightforward

eval (I n) = Left n
eval (B b) = Right b

but now we get in trouble. We would like to write

eval (Add e1 e2) = eval e1 + eval e2  -- ???

but this doesn't type check: the addition function + expects two integer arguments, but eval e1 is of type Either Int Bool and we'd have to extract the Int from that.

Even worse, what happens if e1 actually represents a boolean? The following is a valid expression

B True `Add` I 5 :: Expr

but clearly, it doesn't make any sense; we can't add booleans to integers! In other words, evaluation may return integers or booleans, but it may also fail because the expression makes no sense. We have to incorporate that in the return type:

eval :: Expr -> Maybe (Either Int Bool)

Now, we could write this function just fine, but that would still be unsatisfactory, because what we really want to do is to have Haskell's type system rule out any invalid expressions; we don't want to check types ourselves while deconstructing the abstract syntax tree.

Exercise: Despite our goal, it may still be instructional to implement the eval function; do this.

Starting point:

data Expr = I Int
        | B Bool           -- boolean constants
        | Add Expr Expr
        | Mul Expr Expr
        | Eq  Expr Expr    -- equality test

eval :: Expr -> Maybe (Either Int Bool)
-- Your implementation here.

Phantom types[edit | edit source]

The so-called phantom types are the first step towards our goal. The technique is to augment the Expr with a type variable, so that it becomes

data Expr a = I Int
            | B Bool
            | Add (Expr a) (Expr a)
            | Mul (Expr a) (Expr a)
            | Eq  (Expr a) (Expr a)

Note that an expression Expr a does not contain a value a at all; that's why a is called a phantom type, it's just a dummy variable. Compare that with, say, a list [a] which does contain a bunch of as.

The key idea is that we're going to use a to track the type of the expression for us. Instead of making the constructor

Add :: Expr a -> Expr a -> Expr a

available to users of our small language, we are only going to provide a smart constructor with a more restricted type

add :: Expr Int -> Expr Int -> Expr Int
add = Add

The implementation is the same, but the types are different. Doing this with the other constructors as well,

i :: Int  -> Expr Int
i = I
b :: Bool -> Expr Bool
b = B

the previously problematic expression

b True `add` i 5

no longer type checks! After all, the first argument has the type Expr Bool while add expects an Expr Int. In other words, the phantom type a marks the intended type of the expression. By only exporting the smart constructors, the user cannot create expressions with incorrect types.

As before, we want to implement an evaluation function. With our new marker a, we might hope to give it the type

eval :: Expr a -> a

and implement the first case like this

eval (I n) = n

But alas, this does not work: how would the compiler know that encountering the constructor I means that a = Int? Granted, this will be the case for all expressions that were created by users of our language because they are only allowed to use the smart constructors. But internally, an expression like

I 5 :: Expr String

is still valid. In fact, as you can see, a doesn't even have to be Int or Bool, it could be anything.

What we need is a way to restrict the return types of the constructors themselves, and that's exactly what generalized data types do.

GADTs[edit | edit source]

To enable this language feature, add {-# LANGUAGE GADTs #-} to the beginning of the file.


The obvious notation for restricting the type of a constructor is to write down its type, and that's exactly how GADTs are defined:

data Expr a where
    I   :: Int  -> Expr Int
    B   :: Bool -> Expr Bool
    Add :: Expr Int -> Expr Int -> Expr Int
    Mul :: Expr Int -> Expr Int -> Expr Int
    Eq  :: Eq a => Expr a -> Expr a -> Expr Bool

In other words, we simply list the type signatures of all the constructors. In particular, the marker type a is specialised to Int or Bool according to our needs, just like we would have done with smart constructors.

And the great thing about GADTs is that we now can implement an evaluation function that takes advantage of the type marker:

eval :: Expr a -> a
eval (I n) = n
eval (B b) = b
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Eq  e1 e2) = eval e1 == eval e2

In particular, in the first case

eval (I n) = n

the compiler is now able to infer that a=Int when we encounter a constructor I and that it is legal to return the n :: Int; similarly for the other cases.

To summarise, GADTs allows us to restrict the return types of constructors and thus enable us to take advantage of Haskell's type system for our domain specific languages. Thus, we can implement more languages and their implementation becomes simpler.

Summary[edit | edit source]

Syntax[edit | edit source]

Here's a quick summary of how the syntax for declaring GADTs works.

First, consider the following ordinary algebraic datatypes: the familiar List and Maybe types, and a simple tree type, RoseTree:

Maybe List Rose Tree
data Maybe a =  
    Nothing |   
    Just a
data List a = 
    Nil |  
    Cons a (List a)
data RoseTree a = 
     RoseTree a [RoseTree a]

Remember that the constructors introduced by these declarations can be used both for pattern matches to deconstruct values and as functions to construct values. (Nothing and Nil are functions with "zero arguments".) We can ask what the types of the latter are:

Maybe List Rose Tree
> :t Nothing
Nothing :: Maybe a
> :t Just
Just :: a -> Maybe a  
> :t Nil
Nil :: List a
> :t Cons
Cons :: a -> List a -> List a    
> :t RoseTree
RoseTree ::
   a -> [RoseTree a] -> RoseTree a    

It is clear that this type information about the constructors for Maybe, List and RoseTree respectively is equivalent to the information we gave to the compiler when declaring the datatype in the first place. In other words, it's also conceivable to declare a datatype by simply listing the types of all of its constructors, and that's exactly what the GADT syntax does:

Maybe List Rose Tree
data Maybe a where
   Nothing  :: Maybe a
   Just :: a -> Maybe a
data List a where
   Nil  :: List a
   Cons :: a -> List a -> List a
data RoseTree a where 
   RoseTree ::
      a ->  [RoseTree a] -> RoseTree a

This syntax is made available by the language option {-#LANGUAGE GADTs #-}. It should be familiar to you in that it closely resembles the syntax of type class declarations. It's also easy to remember if you already like to think of constructors as just being functions. Each constructor is just defined by a type signature.

New possibilities[edit | edit source]

Note that when we asked the GHCi for the types of Nothing and Just it returned Maybe a and a -> Maybe a as the types. In these and the other cases, the type of the final output of the function associated with a constructor is the type we were initially defining - Maybe a, List a or RoseTree a. In general, in standard Haskell, the constructor functions for Foo a have Foo a as their final return type. If the new syntax were to be strictly equivalent to the old, we would have to place this restriction on its use for valid type declarations.

So what do GADTs add for us? The ability to control exactly what kind of Foo you return. With GADTs, a constructor for Foo a is not obliged to return Foo a; it can return any Foo blah that you can think of. In the code sample below, for instance, the MkTrueGadtFoo constructor returns a TrueGadtFoo Int even though it is for the type TrueGadtFoo a.

Example: GADT gives you more control:

data FooInGadtClothing a where
 MkFooInGadtClothing :: a -> FooInGadtClothing a

--which is no different from:  data Haskell98Foo a = MkHaskell98Foo a ,

--by contrast, consider:
 
data TrueGadtFoo a where
  MkTrueGadtFoo :: a -> TrueGadtFoo Int

--This has no Haskell 98 equivalent.

But note that you can only push the generalization so far... if the datatype you are declaring is a Foo, the constructor functions must return some kind of Foo or another. Returning anything else simply wouldn't work.

Example: Try this out. It doesn't work:

data Bar where
  BarNone :: Bar -- This is ok

data Foo where
  MkFoo :: Bar Int-- This will not typecheck

Examples[edit | edit source]

Safe Lists[edit | edit source]

Prerequisite: We assume in this section that you know how a List tends to be represented in functional languages
Note: The examples in this section additionally require the extensions EmptyDataDecls and KindSignatures to be enabled

We've now gotten a glimpse of the extra control given to us by the GADT syntax. The only thing new is that you can control exactly what kind of data structure you return. Now, what can we use it for? Consider the humble Haskell list. What happens when you invoke head []? Haskell blows up. Have you ever wished you could have a magical version of head that only accepts lists with at least one element, lists on which it will never blow up?

To begin with, let's define a new type, SafeList x y. The idea is to have something similar to normal Haskell lists [x], but with a little extra information in the type. This extra information (the type variable y) tells us whether or not the list is empty. Empty lists are represented as SafeList x Empty, whereas non-empty lists are represented as SafeList x NonEmpty.

-- we have to define these types
data Empty
data NonEmpty

-- the idea is that you can have either 
--    SafeList a Empty
-- or SafeList a NonEmpty
data SafeList a b where
-- to be implemented

Since we have this extra information, we can now define a function safeHead on only the non-empty lists! Calling safeHead on an empty list would simply refuse to type-check.

safeHead :: SafeList a NonEmpty -> a

So now that we know what we want, safeHead, how do we actually go about getting it? The answer is GADT. The key is that we take advantage of the GADT feature to return two different list-of-a types, SafeList a Empty for the Nil constructor, and SafeList a NonEmpty for the Cons constructor:

data SafeList a b where
  Nil  :: SafeList a Empty
  Cons :: a -> SafeList a b -> SafeList a NonEmpty

This wouldn't have been possible without GADT, because all of our constructors would have been required to return the same type of list; whereas with GADT we can now return different types of lists with different constructors. Anyway, let's put this all together, along with the actual definition of SafeHead:

Example: safe lists via GADT

{-#LANGUAGE GADTs, EmptyDataDecls #-}
-- (the EmptyDataDecls pragma must also appear at the very top of the module,
-- in order to allow the Empty and NonEmpty datatype declarations.)

data Empty
data NonEmpty

data SafeList a b where
     Nil :: SafeList a Empty
     Cons:: a -> SafeList a b -> SafeList a NonEmpty

safeHead :: SafeList a NonEmpty -> a
safeHead (Cons x _) = x

Copy this listing into a file and load in ghci -fglasgow-exts. You should notice the following difference, calling safeHead on a non-empty and an empty-list respectively:

Example: safeHead is... safe

Prelude Main> safeHead (Cons "hi" Nil)
"hi"
Prelude Main> safeHead Nil

<interactive>:1:9:
    Couldn't match `NonEmpty' against `Empty'
      Expected type: SafeList a NonEmpty
      Inferred type: SafeList a Empty
    In the first argument of `safeHead', namely `Nil'
    In the definition of `it': it = safeHead Nil

The complaint is a good thing: it means that we can now ensure during compile-time if we're calling safeHead on an appropriate list. However, that also sets up a pitfall in potential. Consider the following function. What do you think its type is?

Example: Trouble with GADTs

silly False = Nil
silly True  = Cons () Nil

Now try loading the example up in GHCi. You'll notice the following complaint:

Example: Trouble with GADTs - the complaint

Couldn't match `Empty' against `NonEmpty'
     Expected type: SafeList () Empty
     Inferred type: SafeList () NonEmpty
   In the application `Cons () Nil'
   In the definition of `silly': silly True = Cons () Nil

The cases in the definition of silly evaluate to marked lists of different types, leading to a type error. The extra constraints imposed through the GADT make it impossible for a function to produce both empty and non-empty lists.

If we are really keen on defining silly, we can do so by liberalizing the type of Cons, so that it can construct both safe and unsafe lists.

Example: A different approach

{-#LANGUAGE GADTs, EmptyDataDecls, KindSignatures #-}
-- here we add the KindSignatures pragma,
-- which makes the GADT declaration a bit more elegant.

-- Note the subtle yet revealing change in the phantom type names.
data NotSafe
data Safe


data MarkedList             ::  * -> * -> * where
  Nil                       ::  MarkedList t NotSafe
  Cons                      ::  a -> MarkedList a b -> MarkedList a c


safeHead                    ::  MarkedList a Safe -> a
safeHead (Cons x _)          =  x

-- This function will never produce anything that can be consumed by safeHead,
-- no matter that the resulting list is not necessarily empty.
silly                       :: Bool -> MarkedList () NotSafe
silly False                  =  Nil
silly True                   =  Cons () Nil

There is a cost to the fix above: by relaxing the constraint on Cons we throw away the knowledge that it cannot produce an empty list. Based on our first version of the safe list we could, for instance, define a function which took a SafeList a Empty argument and be sure anything produced by Cons would not be accepted by it. That does not hold for the analogous MarkedList a NotSafe; arguably, the type is less useful exactly because it is less restrictive. While in this example the issue may seem minor, given that not much can be done with an empty list, in general it is worth considering.


Exercises
  1. Could you implement a safeTail function? Both versions introduced here would count as valid starting material, as well as any other variants in similar spirit.

A simple expression evaluator[edit | edit source]

Insert the example used in Wobbly Types paper... I thought that was quite pedagogical
This is already covered in the first part of the tutorial.

Discussion[edit | edit source]

More examples, thoughts
From FOSDEM 2006, I vaguely recall that there is some relationship between GADT and the below... what?

Phantom types[edit | edit source]

See Phantom types.

Existential types[edit | edit source]

If you like Existentially quantified types, you'd probably want to notice that they are now subsumed by GADTs. As the GHC manual says, the following two type declarations give you the same thing.

data TE a = forall b. MkTE b (b->a)
data TG a where { MkTG :: b -> (b->a) -> TG a }

Witness types[edit | edit source]