Generalized algebraic datatypes, or simply GADTs, are a generalization of the algebraic data types that you are familiar with. Basically, they allow you to explicitly write down the types of the constructors. In this chapter, you'll learn why this is useful and how to declare your own.

We begin with an example of building a simple embedded domain specific language (EDSL) for simple arithmetical expressions, which is put on a sounder footing with GADTs. This is followed by a review of the syntax for GADTs, with simpler illustrations, and a different application to construct a safe list type for which the equivalent of `head []` fails to typecheck and thus does not give the usual runtime error: `*** Exception: Prelude.head: empty list`.

So, what are GADTs and what are they useful for? GADTs are mainly used to implement domain specific languages, and so this section will introduce them with a corresponding example.

### Arithmetic expressions

Let's consider a small language for arithmetic expressions, given by the data type

```data Expr = I Int         -- integer constants
| Mul Expr Expr -- multiply two expressions
```

In other words, this data type corresponds to the abstract syntax tree, an arithmetic term like `(5+1)*7` would be represented as `(I 5 `Add` I 1) `Mul` I 7 :: Expr`.

Given the abstract syntax tree, we would like to do something with it; we want to compile it, optimize it and so on. For starters, let's write an evaluation function that takes an expression and calculates the integer value it represents. The definition is straightforward:

```eval :: Expr -> Int
eval (I n)       = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
```

### Extending the language

Now, imagine that we would like to extend our language with other types than just integers. For instance, let's say we want to represent equality tests, so we need booleans as well. We augment the `Expr` type to become

```data Expr = I Int
| B Bool           -- boolean constants
| Mul Expr Expr
| Eq  Expr Expr    -- equality test
```

The term `5+1 == 7` would be represented as `(I 5 `Add` I 1) `Eq` I 7`.

As before, we want to write a function `eval` to evaluate expressions. But this time, expressions can now represent either integers or booleans and we have to capture that in the return type

```eval :: Expr -> Either Int Bool
```

The first two cases are straightforward

```eval (I n) = Left n
eval (B b) = Right b
```

but now we get in trouble. We would like to write

```eval (Add e1 e2) = eval e1 + eval e2  -- ???
```

but this doesn't type check: the addition function `+` expects two integer arguments, but `eval e1` is of type `Either Int Bool` and we'd have to extract the `Int` from that.

Even worse, what happens if `e1` actually represents a boolean? The following is a valid expression

```B True `Add` I 5 :: Expr
```

but clearly, it doesn't make any sense; we can't add booleans to integers! In other words, evaluation may return integers or booleans, but it may also fail because the expression makes no sense. We have to incorporate that in the return type:

```eval :: Expr -> Maybe (Either Int Bool)
```

Now, we could write this function just fine, but that would still be unsatisfactory, because what we really want to do is to have Haskell's type system rule out any invalid expressions; we don't want to check types ourselves while deconstructing the abstract syntax tree.

Exercise: Despite our goal, it may still be instructional to implement the `eval` function; do this.

Starting point:

```data Expr = I Int
| B Bool           -- boolean constants
| Mul Expr Expr
| Eq  Expr Expr    -- equality test

eval :: Expr -> Maybe (Either Int Bool)
```

### Phantom types

The so-called phantom types are the first step towards our goal. The technique is to augment the `Expr` with a type variable, so that it becomes

```data Expr a = I Int
| B Bool
| Add (Expr a) (Expr a)
| Mul (Expr a) (Expr a)
| Eq  (Expr a) (Expr a)
```

Note that an expression `Expr a` does not contain a value `a` at all; that's why `a` is called a phantom type, it's just a dummy variable. Compare that with, say, a list `[a]` which does contain a bunch of `a`s.

The key idea is that we're going to use `a` to track the type of the expression for us. Instead of making the constructor

```Add :: Expr a -> Expr a -> Expr a
```

available to users of our small language, we are only going to provide a smart constructor with a more restricted type

```add :: Expr Int -> Expr Int -> Expr Int
```

The implementation is the same, but the types are different. Doing this with the other constructors as well,

```i :: Int  -> Expr Int
i = I
b :: Bool -> Expr Bool
b = B
```

the previously problematic expression

```b True `add` i 5
```

no longer type checks! After all, the first argument has the type `Expr Bool` while `add` expects an `Expr Int`. In other words, the phantom type `a` marks the intended type of the expression. By only exporting the smart constructors, the user cannot create expressions with incorrect types.

As before, we want to implement an evaluation function. With our new marker `a`, we might hope to give it the type

```eval :: Expr a -> a
```

and implement the first case like this

```eval (I n) = n
```

But alas, this does not work: how would the compiler know that encountering the constructor `I` means that `a = Int`? Granted, this will be the case for all expressions that were created by users of our language because they are only allowed to use the smart constructors. But internally, an expression like

```I 5 :: Expr String
```

is still valid. In fact, as you can see, `a` doesn't even have to be `Int` or `Bool`, it could be anything.

What we need is a way to restrict the return types of the constructors themselves, and that's exactly what generalized data types do.

To enable this language feature, add `{-# LANGUAGE GADTs #-}` to the beginning of the file.

The obvious notation for restricting the type of a constructor is to write down its type, and that's exactly how GADTs are defined:

```data Expr a where
I   :: Int  -> Expr Int
B   :: Bool -> Expr Bool
Add :: Expr Int -> Expr Int -> Expr Int
Mul :: Expr Int -> Expr Int -> Expr Int
Eq  :: Eq a => Expr a -> Expr a -> Expr Bool
```

In other words, we simply list the type signatures of all the constructors. In particular, the marker type `a` is specialised to `Int` or `Bool` according to our needs, just like we would have done with smart constructors.

And the great thing about GADTs is that we now can implement an evaluation function that takes advantage of the type marker:

```eval :: Expr a -> a
eval (I n) = n
eval (B b) = b
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Eq  e1 e2) = eval e1 == eval e2
```

In particular, in the first case

```eval (I n) = n
```

the compiler is now able to infer that `a=Int` when we encounter a constructor `I` and that it is legal to return the `n :: Int`; similarly for the other cases.

To summarise, GADTs allows us to restrict the return types of constructors and thus enable us to take advantage of Haskell's type system for our domain specific languages. Thus, we can implement more languages and their implementation becomes simpler.

## Summary

### Syntax

Here's a quick summary of how the syntax for declaring GADTs works.

First, consider the following ordinary algebraic datatypes: the familiar `List` and `Maybe` types, and a simple tree type, `RoseTree`:

Maybe List Rose Tree
```data Maybe a =
Nothing |
Just a
```
```data List a =
Nil |
Cons a (List a)
```
```data RoseTree a =
RoseTree a [RoseTree a]
```

Remember that the constructors introduced by these declarations can be used both for pattern matches to deconstruct values and as functions to construct values. (`Nothing` and `Nil` are functions with "zero arguments".) We can ask what the types of the latter are:

Maybe List Rose Tree
```> :t Nothing
Nothing :: Maybe a
> :t Just
Just :: a -> Maybe a
```
```> :t Nil
Nil :: List a
> :t Cons
Cons :: a -> List a -> List a
```
```> :t RoseTree
RoseTree ::
a -> [RoseTree a] -> RoseTree a
```

It is clear that this type information about the constructors for `Maybe`, `List` and `RoseTree` respectively is equivalent to the information we gave to the compiler when declaring the datatype in the first place. In other words, it's also conceivable to declare a datatype by simply listing the types of all of its constructors, and that's exactly what the GADT syntax does:

Maybe List Rose Tree
```data Maybe a where
Nothing  :: Maybe a
Just :: a -> Maybe a
```
```data List a where
Nil  :: List a
Cons :: a -> List a -> List a
```
```data RoseTree a where
RoseTree ::
a ->  [RoseTree a] -> RoseTree a
```

This syntax is made available by the language option `{-#LANGUAGE GADTs #-}`. It should be familiar to you in that it closely resembles the syntax of type class declarations. It's also easy to remember if you already like to think of constructors as just being functions. Each constructor is just defined by a type signature.

### New possibilities

Note that when we asked the `GHCi` for the types of `Nothing` and `Just` it returned `Maybe a` and `a -> Maybe a` as the types. In these and the other cases, the type of the final output of the function associated with a constructor is the type we were initially defining - `Maybe a`, `List a` or `RoseTree a`. In general, in standard Haskell, the constructor functions for `Foo a` have `Foo a` as their final return type. If the new syntax were to be strictly equivalent to the old, we would have to place this restriction on its use for valid type declarations.

So what do GADTs add for us? The ability to control exactly what kind of `Foo` you return. With GADTs, a constructor for `Foo a` is not obliged to return `Foo a`; it can return any `Foo blah` that you can think of. In the code sample below, for instance, the `MkTrueGadtFoo` constructor returns a `TrueGadtFoo Int` even though it is for the type `TrueGadtFoo a`.

Example: GADT gives you more control:

```data FooInGadtClothing a where

--which is no different from:  data Haskell98Foo a = MkHaskell98Foo a ,

--by contrast, consider:

--This has no Haskell 98 equivalent.
```

But note that you can only push the generalization so far... if the datatype you are declaring is a `Foo`, the constructor functions must return some kind of `Foo` or another. Returning anything else simply wouldn't work.

Example: Try this out. It doesn't work:

```data Bar where
BarNone :: Bar -- This is ok

data Foo where
MkFoo :: Bar Int-- This will not typecheck
```

## Examples

### Safe Lists

Prerequisite: We assume in this section that you know how a List tends to be represented in functional languages
Note: The examples in this section additionally require the extensions EmptyDataDecls and KindSignatures to be enabled

We've now gotten a glimpse of the extra control given to us by the GADT syntax. The only thing new is that you can control exactly what kind of data structure you return. Now, what can we use it for? Consider the humble Haskell list. What happens when you invoke `head []`? Haskell blows up. Have you ever wished you could have a magical version of `head` that only accepts lists with at least one element, lists on which it will never blow up?

To begin with, let's define a new type, `SafeList x y`. The idea is to have something similar to normal Haskell lists `[x]`, but with a little extra information in the type. This extra information (the type variable `y`) tells us whether or not the list is empty. Empty lists are represented as `SafeList x Empty`, whereas non-empty lists are represented as `SafeList x NonEmpty`.

```-- we have to define these types
data Empty
data NonEmpty

-- the idea is that you can have either
--    SafeList a Empty
-- or SafeList a NonEmpty
data SafeList a b where
-- to be implemented
```

Since we have this extra information, we can now define a function `safeHead` on only the non-empty lists! Calling `safeHead` on an empty list would simply refuse to type-check.

```safeHead :: SafeList a NonEmpty -> a
```

So now that we know what we want, `safeHead`, how do we actually go about getting it? The answer is GADT. The key is that we take advantage of the GADT feature to return two different list-of-a types, `SafeList a Empty` for the `Nil` constructor, and `SafeList a NonEmpty` for the `Cons` constructor:

```data SafeList a b where
Nil  :: SafeList a Empty
Cons :: a -> SafeList a b -> SafeList a NonEmpty
```

This wouldn't have been possible without GADT, because all of our constructors would have been required to return the same type of list; whereas with GADT we can now return different types of lists with different constructors. Anyway, let's put this all together, along with the actual definition of `SafeHead`:

```{-#LANGUAGE GADTs, EmptyDataDecls #-}
-- (the EmptyDataDecls pragma must also appear at the very top of the module,
-- in order to allow the Empty and NonEmpty datatype declarations.)

data Empty
data NonEmpty

data SafeList a b where
Nil :: SafeList a Empty
Cons:: a -> SafeList a b -> SafeList a NonEmpty

safeHead :: SafeList a NonEmpty -> a
safeHead (Cons x _) = x
```

Copy this listing into a file and load in `ghci -fglasgow-exts`. You should notice the following difference, calling `safeHead` on a non-empty and an empty-list respectively:

Example: `safeHead` is... safe

```Prelude Main> safeHead (Cons "hi" Nil)
"hi"

<interactive>:1:9:
Couldn't match `NonEmpty' against `Empty'
Expected type: SafeList a NonEmpty
Inferred type: SafeList a Empty
In the first argument of `safeHead', namely `Nil'
In the definition of `it': it = safeHead Nil
```

The complaint is a good thing: it means that we can now ensure during compile-time if we're calling `safeHead` on an appropriate list. However, that also sets up a pitfall in potential. Consider the following function. What do you think its type is?

```silly False = Nil
silly True  = Cons () Nil
```

Now try loading the example up in GHCi. You'll notice the following complaint:

Example: Trouble with GADTs - the complaint

```Couldn't match `Empty' against `NonEmpty'
Expected type: SafeList () Empty
Inferred type: SafeList () NonEmpty
In the application `Cons () Nil'
In the definition of `silly': silly True = Cons () Nil
```

The cases in the definition of `silly` evaluate to marked lists of different types, leading to a type error. The extra constraints imposed through the GADT make it impossible for a function to produce both empty and non-empty lists.

If we are really keen on defining `silly`, we can do so by liberalizing the type of `Cons`, so that it can construct both safe and unsafe lists.

Example: A different approach

```{-#LANGUAGE GADTs, EmptyDataDecls, KindSignatures #-}
-- here we add the KindSignatures pragma,
-- which makes the GADT declaration a bit more elegant.

-- Note the subtle yet revealing change in the phantom type names.
data NotSafe
data Safe

data MarkedList             ::  * -> * -> * where
Nil                       ::  MarkedList t NotSafe
Cons                      ::  a -> MarkedList a b -> MarkedList a c

safeHead                    ::  MarkedList a Safe -> a
safeHead (Cons x _)          =  x

-- This function will never produce anything that can be consumed by safeHead,
-- no matter that the resulting list is not necessarily empty.
silly                       :: Bool -> MarkedList () NotSafe
silly False                  =  Nil
silly True                   =  Cons () Nil
```

There is a cost to the fix above: by relaxing the constraint on `Cons` we throw away the knowledge that it cannot produce an empty list. Based on our first version of the safe list we could, for instance, define a function which took a `SafeList a Empty` argument and be sure anything produced by `Cons` would not be accepted by it. That does not hold for the analogous `MarkedList a NotSafe`; arguably, the type is less useful exactly because it is less restrictive. While in this example the issue may seem minor, given that not much can be done with an empty list, in general it is worth considering.

Exercises
1. Could you implement a `safeTail` function? Both versions introduced here would count as valid starting material, as well as any other variants in similar spirit.

### A simple expression evaluator

Insert the example used in Wobbly Types paper... I thought that was quite pedagogical
This is already covered in the first part of the tutorial.

## Discussion

More examples, thoughts
From FOSDEM 2006, I vaguely recall that there is some relationship between GADT and the below... what?

### Phantom types

See Phantom types.

### Existential types

If you like Existentially quantified types, you'd probably want to notice that they are now subsumed by GADTs. As the GHC manual says, the following two type declarations give you the same thing.

```data TE a = forall b. MkTE b (b->a)
data TG a where { MkTG :: b -> (b->a) -> TG a }
```