Haskell/GADT
Introduction[edit]
Generalized Algebraic Datatypes are, as the name suggests, a generalization of Algebraic Data Types that you are already familiar with. Basically, they allow you to explicitly write down the types of the constructors. In this chapter, you'll learn why this is useful and how to declare your own.
We begin with an example of building a simple embedded domain specific language (EDSL) for simple arithmetical expressions, which is put on a sounder footing with GADTs. This is followed by a review of the syntax for GADTs, with simpler illustrations, and a different application to construct a safe list type for which the equivalent of head []
fails to typecheck and thus does not give the usual runtime error: *** Exception: Prelude.head: empty list
.
Understanding GADTs[edit]
So, what are GADTs and what are they useful for? GADTs are mainly used to implement domain specific languages and this section will introduce them with a corresponding example.
Arithmetic expressions[edit]
Let's consider a small language for arithmetic expressions, given by the data type
data Expr = I Int  integer constants  Add Expr Expr  add two expressions  Mul Expr Expr  multiply two expressions
In other words, this data type corresponds to the abstract syntax tree, an arithmetic term like (5+1)*7
would be represented as (I 5 `Add` I 1) `Mul` I 7 :: Expr
.
Given the abstract syntax tree, we would like to do something with it; we want to compile it, optimize it and so on. For starters, let's write an evaluation function that takes an expression and calculates the integer value it represents. The definition is straightforward:
eval :: Expr > Int eval (I n) = n eval (Add e1 e2) = eval e1 + eval e2 eval (Mul e1 e2) = eval e1 * eval e2
Extending the language[edit]
Now, imagine that we would like to extend our language with other types than just integers. For instance, let's say we want to represent equality tests, so we need booleans as well. We augment the `Expr` type to become
data Expr = I Int  B Bool  boolean constants  Add Expr Expr  Mul Expr Expr  Eq Expr Expr  equality test
The term 5+1 == 7
would be represented as (I 5 `Add` I 1) `Eq` I 7
.
As before, we want to write a function eval
to evaluate expressions. But this time, expressions can now represent either integers or booleans and we have to capture that in the return type
eval :: Expr > Either Int Bool
The first two cases are straightforward
eval (I n) = Left n eval (B b) = Right b
but now we get in trouble. We would like to write
eval (Add e1 e2) = eval e1 + eval e2  ???
but this doesn't type check: the addition function +
expects two integer arguments, but eval e1
is of type Either Int Bool
and we'd have to extract the Int
from that.
Even worse, what happens if e1
actually represents a boolean? The following is a valid expression
B True `Add` I 5 :: Expr
but clearly, it doesn't make any sense; we can't add booleans to integers! In other words, evaluation may return integers or booleans, but it may also fail because the expression makes no sense. We have to incorporate that in the return type:
eval :: Expr > Maybe (Either Int Bool)
Now, we could write this function just fine, but that would still be unsatisfactory, because what we really want to do is to have Haskell's type system rule out any invalid expressions; we don't want to check types ourselves while deconstructing the abstract syntax tree.
Exercise: Despite our goal, it may still be instructional to implement the eval
function; do this.
Starting point:
data Expr = I Int  B Bool  boolean constants  Add Expr Expr  Mul Expr Expr  Eq Expr Expr  equality test eval :: Expr > Maybe (Either Int Bool)  Your implementation here.
Phantom types[edit]
The socalled phantom types are the first step towards our goal. The technique is to augment the Expr
with a type variable, so that it becomes
data Expr a = I Int  B Bool  Add (Expr a) (Expr a)  Mul (Expr a) (Expr a)  Eq (Expr a) (Expr a)
Note that an expression Expr a
does not contain a value a
at all; that's why a
is called a phantom type, it's just a dummy variable. Compare that with, say, a list [a]
which does contain a bunch of a
's.
The key idea is that we're going to use a
to track the type of the expression for us. Instead of making the constructor
Add :: Expr a > Expr a > Expr a
available to users of our small language, we are only going to provide a smart constructor with a more restricted type
add :: Expr Int > Expr Int > Expr Int add = Add
The implementation is the same, but the types are different. Doing this with the other constructors as well,
i :: Int > Expr Int i = I b :: Bool > Expr Bool b = B
the previously problematic expression
b True `add` i 5
no longer type checks! After all, the first arguments has the type Expr Bool
while add
expects an Expr Int
. In other words, the phantom type a
marks the intended type of the expression. By only exporting the smart constructors, the user cannot create expressions with incorrect types.
As before, we want to implement an evaluation function. With our new marker a
, we might hope to give it the type
eval :: Expr a > a
and implement the first case like this
eval (I n) = n
But alas, this does not work: how would the compiler know that encountering the constructor I
means that a = Int
? Granted, this will be case for all the expression that were created by users of our language because they are only allowed to use the smart constructors. But internally, an expression like
I 5 :: Expr String
is still valid. In fact, as you can see, a
doesn't even have to be Int
or Bool
, it could be anything.
What we need is a way to restrict the return types of the constructors themselves, and that's exactly what generalized data types do.
GADTs[edit]
The obvious notation for restricting the type of a constructor is to write down its type, and that's exactly how GADTs are defined:
data Expr a where I :: Int > Expr Int B :: Bool > Expr Bool Add :: Expr Int > Expr Int > Expr Int Mul :: Expr Int > Expr Int > Expr Int Eq :: Expr Int > Expr Int > Expr Bool
In other words, we simply list the type signatures of all the constructors. In particular, the marker type a
is specialised to Int
or Bool
according to our needs, just like we would have done with smart constructors.
And the great thing about GADTs is that we now can implement an evaluation function that takes advantage of the type marker:
eval :: Expr a > a eval (I n) = n eval (B b) = b eval (Add e1 e2) = eval e1 + eval e2 eval (Mul e1 e2) = eval e1 * eval e2 eval (Eq e1 e2) = eval e1 == eval e2
In particular, in the first case
eval (I n) = n
the compiler is now able infer that a=Int
when we encounter a constructor I
and that it is legal to return the n :: Int
; similarly for the other cases.
To summarise, GADTs allows us to restrict the return types of constructors and thus enable us to take advantage of Haskell's type system for our domain specific languages. Thus, we can implement more languages and their implementation becomes simpler.
Summary[edit]
Syntax[edit]
Here a quick summary of how the syntax for declaring GADTs works.
First, consider the following ordinary algebraic datatypes: the familiar List
and Maybe
types, and a simple tree type, RoseTree
:
Maybe  List  Rose Tree 

data Maybe a = Nothing  Just a 
data List a = Nil  Cons a (List a) 
data RoseTree a = RoseTree a [RoseTree a] 
Remember that the constructors introduced by these declarations can be used both for pattern matches to deconstruct values and as functions to construct values. (Nothing
and Nil
are functions with "zero arguments".) We can ask what the types of the latter are:
Maybe  List  Rose Tree 

> :t Nothing Nothing :: Maybe a > :t Just Just :: a > Maybe a 
> :t Nil Nil :: List a > :t Cons Cons :: a > List a > List a 
> :t RoseTree RoseTree :: a > [RoseTree a] > RoseTree a 
It is clear that this type information about the constructors for Maybe
, List
and RoseTree
respectively is equivalent to the information we gave to the compiler when declaring the datatype in the first place. In other words, it's also conceivable to declare a datatype by simply listing the types of all of its constructors, and that's exactly what the GADT syntax does:
Maybe  List  Rose Tree 

data Maybe a where Nothing :: Maybe a Just :: a > Maybe a 
data List a where Nil :: List a Cons :: a > List a > List a 
data RoseTree a where RoseTree :: a > [RoseTree a] > RoseTree a 
This syntax is made available by the language option {#LANGUAGE GADTs #}
. It should be familiar to you in that it closely resembles the syntax of type class declarations. It's also easy to remember if you already like to think of constructors as just being functions. Each constructor is just defined by a type signature.
New possibilities[edit]
Note that when we asked the GHCi
for the types of Nothing
and Just
it returned Maybe a
and a > Maybe a
as the types. In these and the other cases, the type of the final output of the function associated with a constructor is the type we were initially defining  Maybe a
, List a
or RoseTree a
. In general, in standard Haskell, the constructor functions for Foo a
have Foo a
as their final return type. If the new syntax were to be strictly equivalent to the old, we would have to place this restriction on its use for valid type declarations.
So what do GADTs add for us? The ability to control exactly what kind of Foo
you return. With GADTs, a constructor for Foo a
is not obliged to return Foo a
; it can return any Foo blah
that you can think of. In the code sample below, for instance, the GadtedFoo
constructor returns a GadtedFoo Int
even though it is for the type GadtedFoo x
.
Example: GADT gives you more control
data FooInGadtClothing a where MkFooInGadtClothing :: a > FooInGadtClothing a which is no different from: data Haskell98Foo a = MkHaskell98Foo a , by contrast, consider: data TrueGadtFoo a where MkTrueGadtFoo :: a > TrueGadtFoo Int This has no Haskell 98 equivalent.
But note that you can only push the generalization so far... if the datatype you are declaring is a Foo
, the constructor functions must return some kind of Foo
or another. Returning anything else simply wouldn't work
Example: Try this out. It doesn't work
data Bar where BarNone :: Bar  This is ok data Foo where MkFoo :: Bar Int This will not typecheck
Examples[edit]
Safe Lists[edit]
 Prerequisite: We assume in this section that you know how a List tends to be represented in functional languages
 Note: The examples in this section additionally require the extensions EmptyDataDecls and KindSignatures to be enabled
We've now gotten a glimpse of the extra control given to us by the GADT syntax. The only thing new is that you can control exactly what kind of data structure you return. Now, what can we use it for? Consider the humble Haskell list. What happens when you invoke head []
? Haskell blows up. Have you ever wished you could have a magical version of head
that only accepts lists with at least one element, lists on which it will never blow up?
To begin with, let's define a new type, SafeList x y
. The idea is to have something similar to normal Haskell lists [x]
, but with a little extra information in the type. This extra information (the type variable y
) tells us whether or not the list is empty. Empty lists are represented as SafeList x Empty
, whereas nonempty lists are represented as SafeList x NonEmpty
.
 we have to define these types data Empty data NonEmpty  the idea is that you can have either  SafeList a Empty  or SafeList a NonEmpty data SafeList a b where  to be implemented
Since we have this extra information, we can now define a function safeHead
on only the nonempty lists! Calling safeHead
on an empty list would simply refuse to typecheck.
safeHead :: SafeList a NonEmpty > a
So now that we know what we want, safeHead
, how do we actually go about getting it? The answer is GADT. The key is that we take advantage of the GADT feature to return two different listofa types, SafeList a Empty
for the Nil
constructor, and SafeList a NonEmpty
for the Cons
constructor:
data SafeList a b where Nil :: SafeList a Empty Cons :: a > SafeList a b > SafeList a NonEmpty
This wouldn't have been possible without GADT, because all of our constructors would have been required to return the same type of list; whereas with GADT we can now return different types of lists with different constructors. Anyway, let's put this all together, along with the actual definition of SafeHead
:
Example: safe lists via GADT
{#LANGUAGE GADTs, EmptyDataDecls #}  (the EmptyDataDecls pragma must also appear at the very top of the module,  in order to allow the Empty and NonEmpty datatype declarations.) data Empty data NonEmpty data SafeList a b where Nil :: SafeList a Empty Cons:: a > SafeList a b > SafeList a NonEmpty safeHead :: SafeList a NonEmpty > a safeHead (Cons x _) = x
Copy this listing into a file and load in ghci fglasgowexts
. You should notice the following difference, calling safeHead
on a nonempty and an emptylist respectively:
Example: safeHead
is... safe
Prelude Main> safeHead (Cons "hi" Nil) "hi" Prelude Main> safeHead Nil <interactive>:1:9: Couldn't match `NonEmpty' against `Empty' Expected type: SafeList a NonEmpty Inferred type: SafeList a Empty In the first argument of `safeHead', namely `Nil' In the definition of `it': it = safeHead Nil
The complaint is a good thing: it means that we can now ensure during compiletime if we're calling safeHead
on an appropriate list. However, that also sets up a pitfall in potential. Consider the following function. What do you think its type is?
Example: Trouble with GADTs
silly False = Nil silly True = Cons () Nil
Now try loading the example up in GHCi. You'll notice the following complaint:
Example: Trouble with GADTs  the complaint
Couldn't match `Empty' against `NonEmpty' Expected type: SafeList () Empty Inferred type: SafeList () NonEmpty In the application `Cons () Nil' In the definition of `silly': silly True = Cons () Nil
The cases in the definition of silly
evaluate to marked lists of different types, leading to a type error. The extra constraints imposed through the GADT make it impossible for a function to produce both empty and nonempty lists.
If we are really keen on defining silly
, we can do so by liberalizing the type of Cons
, so that it can construct both safe and unsafe lists.
Example: A different approach
{#LANGUAGE GADTs, EmptyDataDecls, KindSignatures #}  here we add the KindSignatures pragma,  which makes the GADT declaration a bit more elegant.  Note the subtle yet revealing change in the phantom type names. data NotSafe data Safe data MarkedList :: * > * > * where Nil :: MarkedList t NotSafe Cons :: a > MarkedList a b > MarkedList a c safeHead :: MarkedList a Safe > a safeHead (Cons x _) = x  This function will never produce anything that can be consumed by safeHead,  no matter that the resulting list is not necessarily empty. silly :: Bool > MarkedList () NotSafe silly False = Nil silly True = Cons () Nil
There is a cost to the fix above: by relaxing the constraint on Cons
we throw away the knowledge that it cannot produce an empty list. Based on our first version of the safe list we could, for instance, define a function which took a SafeList a Empty
argument and be sure anything produced by Cons
would not be accepted by it. That does not hold for the analogous MarkedList a NotSafe
; arguably, the type is less useful exactly because it is less restrictive. While in this example the issue may seem minor, given that not much can be done with an empty list, in general it is worth considering.
Exercises 


A simple expression evaluator[edit]
 Insert the example used in Wobbly Types paper... I thought that was quite pedagogical
 This is already covered in the first part of the tutorial.
Discussion[edit]
 More examples, thoughts
 From FOSDEM 2006, I vaguely recall that there is some relationship between GADT and the below... what?
Phantom types[edit]
See Phantom types.
Existential types[edit]
If you like Existentially quantified types, you'd probably want to notice that they are now subsumed by GADTs. As the GHC manual says, the following two type declarations give you the same thing.
data TE a = forall b. MkTE b (b>a) data TG a where { MkTG :: b > (b>a) > TG a }
Heterogeneous lists are accomplished with GADTs like this:
data TE2 = forall b. Show b => MkTE2 [b] data TG2 where MkTG2 :: Show b => [b] > TG2
Witness types[edit]
This page is a stub. You can help Haskell by expanding it. 
At least part of this page was imported from the Haskell wiki article Generalised algebraic datatype, in accordance to its Simple Permissive License. If you wish to modify this page and if your changes will also be useful on that wiki, you might consider modifying that source page instead of this one, as changes from that page may propagate here, but not the other way around. Alternately, you can explicitly dual license your contributions under the Simple Permissive License. 