F Sharp Programming/Advanced Data Structures

F# : Advanced Data Structures

F# comes with its own set of data structures, however its very important to know how to implement data structures from scratch.

Incidentally, hundreds of authors have written thousands of lengthy volumes on this single topic alone, so its unreasonable to provide a comprehensive picture of data structures in the short amount of space available for this book. Instead, this chapter is intended as a cursory introduction to the development of immutable data structures using F#. Readers are encouraged to use the resources listed at the bottom of this page for a more comprehensive treatment of algorithms and data structures.

Stacks

F#'s built-in list data structure is essentially an immutable stack. While its certainly usable, for the purposes of writing exploratory code, we're going to implement a stack from scratch. We can represent each node in a stack using a simple union:

type 'a stack =
    | EmptyStack
    | StackNode of 'a * 'a stack

It's easy enough to create an instance of a stack using:

let stack = StackNode(1, StackNode(2, StackNode(3, StackNode(4, StackNode(5, EmptyStack)))))

Each StackNode contains a value and a pointer to the next stack in the list. The resulting data structure can be diagrammed as follows:

 ___    ___    ___    ___    ___ 
|_1_|->|_2_|->|_3_|->|_4_|->|_5_|->Empty

We can create a boilerplate stack module as follows:

module Stack =
    type 'a stack =
        | EmptyStack
        | StackNode of 'a * 'a stack
        
    let hd = function
        | EmptyStack -> failwith "Empty stack"
        | StackNode(hd, tl) -> hd
        
    let tl = function
        | EmptyStack -> failwith "Empty stack"
        | StackNode(hd, tl) -> tl
        
    let cons hd tl = StackNode(hd, tl)
    
    let empty = EmptyStack

Let's say we wanted to add a few methods to our stack, such as a method which updates an item at a certain index. Since our nodes are immutable, we can't update our list in place; we need to copy all of the nodes up to the node we want to update.

Setting item at index 2 to the value 9.

         0      1      2      3      4
          ___    ___    ___    ___    ___ 
let x =  |_1_|->|_2_|->|_3_|->|_4_|->|_5_|->Empty
                              ^
                             /
          ___    ___    ___ /
let y =  |_1_|->|_2_|->|_9_|

So, we copy all of the nodes up to index 2 and reuse the remaining nodes. A function like this is very easy to write:

let rec update index value s =
    match index, s with
    | index, EmptyStack -> failwith "Index out of range"
    | 0, StackNode(hd, tl) -> StackNode(value, tl)
    | n, StackNode(hd, tl) -> StackNode(hd, update (index - 1) value tl)

Appending items from one stack to the rear of another uses a similar technique. Since we can't modify stacks in place, we append two stacks by copying all of the nodes from the "front" stack and pointing the last copied node to the our "rear" stack, resulting in the following:

Append x and y

          ___    ___    ___    ___    ___ 
let x =  |_1_|->|_2_|->|_3_|->|_4_|->|_5_|->Empty
                                             ___    ___    ___ 
let y =                                     |_6_|->|_7_|->|_8_|->Empty
                                            ^
                                           /
          ___    ___    ___    ___    ___ /
let z =  |_1_|->|_2_|->|_3_|->|_4_|->|_5_|

We can implement this function with minimal effort using the following:

let rec append x y =
    match x with
    | EmptyStack -> y
    | StackNode(hd, tl) -> StackNode(hd, append tl y)

Stacks are very easy to work with and implement. The principles behind copying nodes to "modify" stacks is fundamentally the same for all persistent data structures.

Complete Stack Module

module Stack =
    type 'a stack =
        | EmptyStack
        | StackNode of 'a * 'a stack
        
    let hd = function
        | EmptyStack -> failwith "Empty stack"
        | StackNode(hd, tl) -> hd
        
    let tl = function
        | EmptyStack -> failwith "Emtpy stack"
        | StackNode(hd, tl) -> tl
        
    let cons hd tl = StackNode(hd, tl)
    
    let empty = EmptyStack
    
    let rec update index value s =
        match index, s with
        | index, EmptyStack -> failwith "Index out of range"
        | 0, StackNode(hd, tl) -> StackNode(value, tl)
        | n, StackNode(hd, tl) -> StackNode(hd, update (index - 1) value tl)
        
    let rec append x y =
        match x with
        | EmptyStack -> y
        | StackNode(hd, tl) -> StackNode(hd, append tl y)

    let rec map f = function
        | EmptyStack -> EmptyStack
        | StackNode(hd, tl) -> StackNode(f hd, map f tl)
        
    let rec rev s =
        let rec loop acc = function
            | EmptyStack -> acc
            | StackNode(hd, tl) -> loop (StackNode(hd, acc)) tl
        loop EmptyStack s

    let rec contains x = function
        | EmptyStack -> false
        | StackNode(hd, tl) -> hd = x || contains x tl
        
    let rec fold f seed = function
        | EmptyStack -> seed
        | StackNode(hd, tl) -> fold f (f seed hd) tl

Queues

Naive Queue

Queues aren't quite as straightforward as stacks. A naive queue can be implemented using a stack, with the caveat that:

Items are always appended to the end of the list, and dequeued from the head of the stack.
-OR- Items are prepended to the front of the stack, and dequeued by reversing the stack and getting its head.

(* AwesomeCollections.fsi *)

type 'a stack =
  | EmptyStack
  | StackNode of 'a * 'a stack
  
module Stack = begin
  val hd : 'a stack -> 'a
  val tl : 'a stack -> 'a stack
  val cons : 'a -> 'a stack -> 'a stack
  val empty : 'a stack
  val rev : 'a stack -> 'a stack
end

[<Class>]
type 'a Queue =
    member hd : 'a
    member tl : 'a Queue
    member enqueue : 'a -> 'a Queue
    static member empty : 'a Queue

(* AwesomeCollections.fs *)

type 'a stack =
    | EmptyStack
    | StackNode of 'a * 'a stack
        
module Stack =        
    let hd = function
        | EmptyStack -> failwith "Empty stack"
        | StackNode(hd, tl) -> hd
        
    let tl = function
        | EmptyStack -> failwith "Emtpy stack"
        | StackNode(hd, tl) -> tl
        
    let cons hd tl = StackNode(hd, tl)
    
    let empty = EmptyStack
        
    let rec rev s =
        let rec loop acc = function
            | EmptyStack -> acc
            | StackNode(hd, tl) -> loop (StackNode(hd, acc)) tl
        loop EmptyStack s
    
type Queue<'a>(item : stack<'a>) =
    member this.hd
        with get() = Stack.hd (Stack.rev item)
        
    member this.tl
        with get() = Queue(item |> Stack.rev |> Stack.tl |> Stack.rev)
        
    member this.enqueue(x) = Queue(StackNode(x, item))
    
    override this.ToString() = sprintf "%A" item
    
    static member empty = Queue<'a>(Stack.empty)

We use an interface file to hide the Queue class's constructor. Although this technically satisfies the function of a queue, every dequeue is an O(n) operation where n is the number of items in the queue. There are lots of variations on the same approach, but these are often not very practical in practice. We can certainly improve on the implementation of immutable queues.

Queue From Two Stacks

The implementation above isn't very efficient because it requires reversing our underlying data representation several times. Why not keep those reversed stacks around for future use? Rather than using one stack, we can have two stacks: a front stack f and a rear stack r.

Stack f holds items in the correct order, while stack r holds items in reverse order; this allows the first element in f to be the head of the queue, and the first element in r to be the last item in queue. So, a queue of the numbers 1 .. 6 might be represented with f = [1;2;3] and r = [6;5;4].

To enqueue a new item, prepend it to the front of r; to dequeue an item, pop it off f. Both enqueues and dequeues are O(1) operations. Of course, at some point, f will be empty and there will be no more items to dequeue; in this case, simply move all items from r to f and reverse the list. While the queue certainly has O(n) worst-case behavior, it has acceptable O(1) amortized (average case) bounds.

The code for this implementation is straight forward:

type Queue<'a>(f : stack<'a>, r : stack<'a>) =
    let check = function
        | EmptyStack, r -> Queue(Stack.rev r, EmptyStack)
        | f, r -> Queue(f, r)

    member this.hd =
        match f with
        | EmptyStack -> failwith "empty"
        | StackNode(hd, tl) -> hd
        
    member this.tl =
        match f, r with
        | EmptyStack, _ -> failwith "empty"
        | StackNode(x, f), r -> check(f, r)
        
    member this.enqueue(x) = check(f, StackNode(x, r))
    
    static member empty = Queue<'a>(Stack.empty, Stack.empty)

This is a simple, common, and useful implementation of an immutable queue. The magic is in the check function which maintains that f always contains items if they are available.

Note: The queue's periodic O(n) worst case behavior can give it unpredictable response times, especially in applications which rely heavily on persistence since its possible to hit the pathological case each time the queue is accessed. However, this particular implementation of queues is perfectly adequate for the vast majority of applications which do not require persistence or uniform response times.

As shown above, we often want to wrap our underlying data structure in class for two reasons:

To simplify the interface to the data structure. For example, clients neither know nor care that our queue uses two stacks; they only know that items in the queue obey the principle of first-in, first-out.
To prevent clients from putting the underlying data in the data structure in an invalid state.

Beyond stacks, virtually all data structures are complex enough to require wrapping up class to hide away complex details from clients.

Binary Search Trees

Binary search trees are similar to stacks, but each node points to two other nodes called the left and right child nodes:

type 'a tree =
    | EmptyTree
    | TreeNode of 'a * 'a tree * 'a tree

Additionally, nodes in the tree are ordered in a particular way: each item in a tree is greater than all items in its left child node and less than all items in its right child node.

Since our tree is immutable, we "insert" into the tree by returning a brand new tree with the node inserted. This process is more efficient than it sounds: we copy nodes as we traverse down the tree, so we only copy nodes which are in the path of our node being inserted. Writing a binary search tree is relatively straightforward:

(* AwesomeCollections.fsi *)

[<Class>]
type 'a BinaryTree =
    member hd : 'a
    member exists : 'a -> bool
    member insert : 'a -> 'a BinaryTree

(* AwesomeCollections.fs *)

type 'a tree =
    | EmptyTree
    | TreeNode of 'a * 'a tree * 'a tree
    
module Tree =
    let hd = function
        | EmptyTree -> failwith "empty"
        | TreeNode(hd, l, r) -> hd
        
    let rec exists item = function
        | EmptyTree -> false
        | TreeNode(hd, l, r) ->
            if hd = item then true
            elif item < hd then exists item l
            else exists item r
            
    let rec insert item = function
        | EmptyTree -> TreeNode(item, EmptyTree, EmptyTree)
        | TreeNode(hd, l, r) as node ->
            if hd = item then node
            elif item < hd then TreeNode(hd, insert item l, r)
            else TreeNode(hd, l, insert item r)
    
type 'a BinaryTree(inner : 'a tree) =
    member this.hd = Tree.hd inner
        
    member this.exists item = Tree.exists item inner
            
    member this.insert item = BinaryTree(Tree.insert item inner)
    
    member this.empty = BinaryTree<'a>(EmptyTree)

We're using an interface and a wrapper class to hide the implementation details of the tree from the user, otherwise the user could construct a tree which invalidates the specific ordering rules used in the binary tree.

This implementation is simple and it allows us to add and lookup any item in the tree in O(log n) best case time. However, it suffers from a pathological case: if we add items in sorted order, or mostly sorted order, then the tree can become heavily unbalanced. For example, the following code:

[1 .. 7] |> Seq.fold (fun (t : BinaryTree<_>) x -> t.insert(x)) BinaryTree.empty

Results in this tree:

A tree like this isn't much better than our inefficient queue implementation above! Trees are most efficient when they have a minimum height and are as full as possible. Ideally, we'd like to represent the tree above as follows:

         
        _ 4 _
       /     \
      2       6
     / \     / \
    1   3   5   7
   / \ / \ / \ / \
   E E E E E E E E

The minimum height of the tree is ceiling(log n + 1), where n is the number of items in the list. When we insert items into the tree, we want the tree to balance itself to maintain the minimum height. There are a variety of self-balancing tree implementations, many of which are easy to implement as immutable data structures.

Red Black Trees

Red-black trees are self-balancing trees which attach a "color" attribute to each node in the tree. In addition to the rules defining a binary search tree, red-black trees must maintain the following set of rules:

A node is either red or black.
The root node is always black.
No red node has a red child.
Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.

We can augment our binary tree with a color field as follows:

type color = R | B    
type 'a tree =
    | E
    | T of color * 'a * 'a tree * 'a tree

When we insert into the tree, we need to rebalance the tree to restore the rules. In particular, we need to remove nodes with a red child. There are four cases where a red node may have a red child. They are depicted in the diagram below by the top, right, bottom, and left trees. The center tree is the balanced version.

                        B(z)
                        /  \
                      R(x)  d
                      /  \
                     a   R(y)
                         /  \
                        b    c
  
                         ||
                         \/

      B(z)              B(y)                 B(x)
      /  \              /  \                 /  \
    R(y)  d    =>     R(x) R(z)    <=       a   R(y)
    /  \              / \  / \                  /  \
  R(x)  c             a b  c d                 b   R(z)
  /  \                                             /  \
 a    b                                           c    d

                         /\
                         ||

                       B(x)
                       /  \
                      a   R(z)
                          /  \
                        R(y)  d
                        /  \
                       b    c

We can modify our binary tree class as follows:

(* AwesomeCollections.fsi *)

[<Class>]
type 'a BinaryTree =
    member hd : 'a
    member left : 'a BinaryTree
    member right : 'a BinaryTree
    member exists : 'a -> bool
    member insert : 'a -> 'a BinaryTree
    member print : unit -> unit
    static member empty : 'a BinaryTree

(* AwesomeCollections.fs *)

type color = R | B    
type 'a tree =
    | E
    | T of color * 'a tree * 'a * 'a tree
    
module Tree =
    let hd = function
        | E -> failwith "empty"
        | T(c, l, x, r) -> x
        
    let left = function
        | E -> failwith "empty"
        | T(c, l, x, r) -> l
    
    let right = function
        | E -> failwith "empty"
        | T(c, l, x, r) -> r
        
    let rec exists item = function
        | E -> false
        | T(c, l, x, r) ->
            if item = x then true
            elif item < x then exists item l
            else exists item r
            
    let balance = function                              (* Red nodes in relation to black root *)
        | B, T(R, T(R, a, x, b), y, c), z, d            (* Left, left *)
        | B, T(R, a, x, T(R, b, y, c)), z, d            (* Left, right *)
        | B, a, x, T(R, T(R, b, y, c), z, d)            (* Right, left *)
        | B, a, x, T(R, b, y, T(R, c, z, d))            (* Right, right *)
            -> T(R, T(B, a, x, b), y, T(B, c, z, d))
        | c, l, x, r -> T(c, l, x, r)
    
    let insert item tree =
        let rec ins = function
            | E -> T(R, E, item, E)
            | T(c, a, y, b) as node ->
                if item = y then node
                elif item < y then balance(c, ins a, y, b)
                else balance(c, a, y, ins b)

        (* Forcing root node to be black *)                
        match ins tree with
            | E -> failwith "Should never return empty from an insert"
            | T(_, l, x, r) -> T(B, l, x, r)
            
    let rec print (spaces : int) = function
        | E -> ()
        | T(c, l, x, r) ->
            print (spaces + 4) r
            printfn "%s %A%A" (new System.String(' ', spaces)) c x
            print (spaces + 4) l
    
type 'a BinaryTree(inner : 'a tree) =
    member this.hd = Tree.hd inner
    member this.left = BinaryTree(Tree.left inner)
    member this.right = BinaryTree(Tree.right inner)
    member this.exists item = Tree.exists item inner
    member this.insert item = BinaryTree(Tree.insert item inner)
    member this.print() = Tree.print 0 inner
    static member empty = BinaryTree<'a>(E)

All of the magic that makes this tree work happens in the balance function. We're not performing any terribly complicated transformations to the tree, yet it comes out relatively balanced (in fact, the maximum depth of this tree is 2 * ceiling(log n + 1) ).

AVL Trees

AVL trees are named after its two inventors, G.M. Adelson-Velskii and E.M. Landis. These trees are self-balancing because the heights of the two child subtrees of any node will only differ 0 or 1; therefore, these trees are said to be height-balanced.

An empty node in a tree has a height of 0; non-empty nodes have a height >= 1. We can store the height of each node in our tree definition:

type 'a tree =
    | Node of int * 'a tree * 'a * 'a tree  (* height, left child, value, right child *)
    | Nil

The height of any node is equal to max(left height, right height) + 1. For convenience, we'll use the following constructor to create a tree node and initialize its height:

let height = function
    | Node(h, _, _, _) -> h
    | Nil -> 0
    
let make l x r =
    let h = 1 + max (height l) (height r)
    Node(h, l, x ,r)

Inserting into an AVL tree is very similar to inserting into an unbalanced binary tree with one exception: after we insert a node, we use a series of tree rotations to re-balance the tree. Each node has an implicit property, its balance factor, which refers to the left-child's height minus the right-child's height; a positive balance factor indicates the tree is weighted on the left, negative indicates the tree is weighted on the right, otherwise the tree is balanced.

We only need to rebalance the tree when balance factor for a node is +/-2. There are four scenarios which can cause our tree to become unbalanced:

Left-left case: root balance factor = +2, left-childs balance factor = +1. Balanced by right-rotating the root node:

         5                            3
       /   \    Root                /   \
      3     D   Right rotation     2     5
    /  \            ----->        / \   / \
   2    C                        A   B C   D
  / \
 A   B

Left-right case: root balance factor = +2, right-child's balance factor = -1. Balanced by left-rotating the left child, then right-rotating the root (this operation is called a double right rotation):

       5                              5                                 4
     /   \      Left child          /   \        Root                 /   \
   3      D     Left rotation      4     D       Right rotation      3     5
  / \                ----->       / \               ----->          / \   / \
 A   4                           3   C                             A   B C   D
    / \                         / \
   B   C                       A   B

Right-right case: root balance factor = -2, right-child's balance factor = -1. Balanced by left-rotating the root node:

    3                                 5
  /   \         Root                /   \
 A     5        Left rotation      3     7
      / \           ----->        / \   / \
     B   7                       A   B C   D
        / \
       C   D

Right-left case: root balance factor = -2, right-child's balance factor = +1. Balanced by right-rotating the right child, then left-rotating the root (this operation is called a double-left rotation):

     3                               3                                 4
   /   \        Right child        /   \        Root                 /   \
  A      5      Right rotation    A     4       Left rotation       3     5
        / \         ----->             / \         ----->          / \   / \
       4   D                          B   5                       A   B C   D
      / \                                / \
     B   C                              C   D

With this in mind, its very easy to put together the rest of our AVL tree:

(* AwesomeCollections.fsi *)
[<Class>]
type 'a AvlTree =
    member Height : int
    member Left : 'a AvlTree
    member Right : 'a AvlTree
    member Value : 'a
    member Insert : 'a -> 'a AvlTree
    member Contains : 'a -> bool

module AvlTree =
    [<GeneralizableValue>]
    val empty<'a> : AvlTree<'a>

(* AwesomeCollections.fs *)

type 'a tree =
    | Node of int * 'a tree * 'a * 'a tree
    | Nil
    
(*
    Notation:
        h = height
        x = value
        l = left child
        r = right child
        
        lh = left child's height
        lx = left child's value
        ll = left child's left child
        lr = left child's right child
        
        rh = right child's height
        rx = right child's value
        rl = right child's left child
        rr = right child's right child
*)
    
let height = function
    | Node(h, _, _, _) -> h
    | Nil -> 0
    
let make l x r =
    let h = 1 + max (height l) (height r)
    Node(h, l, x ,r)

let rotRight = function
    | Node(_, Node(_, ll, lx, lr), x, r) ->
        let r' = make lr x r
        make ll lx r'
    | node -> node
    
let rotLeft = function
    | Node(_, l, x, Node(_, rl, rx, rr)) ->
        let l' = make l x rl
        make l' rx rr
    | node -> node
    
let doubleRotLeft = function
    | Node(h, l, x, r) ->
        let r' = rotRight r
        let node' = make l x r'
        rotLeft node'
    | node -> node
    
let doubleRotRight = function
    | Node(h, l, x, r) ->
        let l' = rotLeft l
        let node' = make l' x r
        rotRight node'
    | node -> node
    
let balanceFactor = function
    | Nil -> 0
    | Node(_, l, _, r) -> (height l) - (height r)
    
let balance = function
    (* left unbalanced *)
    | Node(h, l, x, r) as node when balanceFactor node >= 2 ->
        if balanceFactor l >= 1 then rotRight node      (* left left case *)
        else doubleRotRight node                        (* left right case *)
    (* right unbalanced *)
    | Node(h, l, x, r) as node when balanceFactor node <= -2 ->
        if balanceFactor r <= -1 then rotLeft node      (* right right case *)
        else doubleRotLeft node                         (* right left case *)
    | node -> node
    
let rec insert v = function
    | Nil -> Node(1, Nil, v, Nil)
    | Node(_, l, x, r) as node ->
        if v = x then node
        else
            let l', r' = if v < x then insert v l, r else l, insert v r
            let node' = make l' x r'
            balance <| node'
            
let rec contains v = function
    | Nil -> false
    | Node(_, l, x, r) ->
        if v = x then true
        else
            if v < x then contains v l
            else contains v r

type 'a AvlTree(tree : 'a tree) =
    member this.Height = height tree
    
    member this.Left =
        match tree with
        | Node(_, l, _, _) -> new AvlTree<'a>(l)
        | Nil -> failwith "Empty tree"
    
    member this.Right =
        match tree with
        | Node(_, _, _, r) -> new AvlTree<'a>(r)
        | Nil -> failwith "Empty tree"
        
    member this.Value =
        match tree with
        | Node(_, _, x, _) -> x
        | Nil -> failwith "Empty tree"
        
    member this.Insert(x) = new AvlTree<'a>(insert x tree)
    
    member this.Contains(v) = contains v tree
    
module AvlTree =
    [<GeneralizableValue>]
    let empty<'a> : AvlTree<'a> = new AvlTree<'a>(Nil)

Note: The [<GeneralizableValue>] attribute indicates to F# that the construct can give rise to generic code through type inference . Without the attribute, F# will infer the type of AvlTree.empty as the undefined type AvlTree<'_a>, resulting in a "value restriction" error at compilation.

Optimization tip: The tree supports inserts and lookups in log(n) time, where n is the number of nodes in the tree. This is already pretty good, but we can make it faster by eliminating unnecessary comparisons. Notice when we insert a node into the left side of the tree, we can only add weight to the left child; however, the balance function checks both sides of the tree for each insert. By re-writing balance into a balance_left and balance_right function to handle, we can handle left- and right-child inserts separately. Similar optimizations are possible on the red-black tree implementation as well.

An AVL trees height is limited to 1.44 * log(n), whereas a red-black tree's height is limited to 2 * log(n). The AVL trees smaller height and more rigid balancing leads to slower insert/removal but faster retrieval than red-black trees. In practice, the difference will be hardly noticeable: a lookup on a 10,000,000 node AVL tree lookup requires at most 34 comparisons, compared to 47 comparisons on a red-black tree.

Heaps

Binary search trees can efficiently find arbitrary elements in a set, however it can be occasionally useful to access the minimum element in set. Heaps are special data structure which satisfy the heap property: the value of every node is greater than the value of any of its child nodes. Additionally, we can keep the tree approximately balanced using the leftist property, meaning that the height of any left child heap is at least as large as its right sibling. We can hold the height of each tree in each heap node.

Finally, since heaps can be implemented as min- or max-heaps, where the root element will either be the largest or smallest element in the set, we support both types of heaps by passing in an ordering function into heap's constructor as such:

type 'a heap =
    | EmptyHeap
    | HeapNode of int * 'a * 'a heap * 'a heap
    
type 'a BinaryHeap(comparer : 'a -> 'a -> int, inner : 'a heap) =
    static member make(comparer) = BinaryHeap<_>(comparer, EmptyHeap)

Note: the functionality we gain by passing the comparer function into the BinaryHeap constructor approximates OCaml functors, although its not quite as elegant.

An interesting consequence of the leftist property is that elements along any path in a heap are stored in sorted order. This means we can merge any two heaps by merging their right spines and swapping children as necessary to restore the leftist property. Since each right spine contains at least as many nodes as the left spine, the height of each right spine is proportional to the logarithm of the number of elements in the heap, so merging two heaps can be performed in O(log n) time. We can implement all of the properties of our heap as follows:

(* AwesomeCollections.fsi *)

[<Class>]
type 'a BinaryHeap =
    member hd : 'a
    member tl : 'a BinaryHeap
    member insert : 'a -> 'a BinaryHeap
    member merge : 'a BinaryHeap -> 'a BinaryHeap
    interface System.Collections.IEnumerable
    interface System.Collections.Generic.IEnumerable<'a>
    static member make : ('b -> 'b -> int) -> 'b BinaryHeap

(* AwesomeCollections.fs *)

type 'a heap =
    | EmptyHeap
    | HeapNode of int * 'a * 'a heap * 'a heap
 
module Heap =
    let height = function
        | EmptyHeap -> 0
        | HeapNode(h, _, _, _) -> h
 
    (* Helper function to restore the leftist property *)        
    let makeT (x, a, b) =
        if height a >= height b then HeapNode(height b + 1, x, a, b)
        else HeapNode(height a + 1, x, b, a)
 
    let rec merge comparer = function
        | x, EmptyHeap -> x
        | EmptyHeap, x -> x
        | (HeapNode(_, x, l1, r1) as h1), (HeapNode(_, y, l2, r2) as h2) ->
            if comparer x y <= 0 then makeT(x, l1, merge comparer (r1, h2))
            else makeT (y, l2, merge comparer (h1, r2))
 
    let hd = function
        | EmptyHeap -> failwith "empty"
        | HeapNode(h, x, l, r) -> x
 
    let tl comparer = function
        | EmptyHeap -> failwith "empty"
        | HeapNode(h, x, l, r) -> merge comparer (l, r)
        
    let rec to_seq comparer = function
        | EmptyHeap -> Seq.empty
        | HeapNode(h, x, l, r) as node -> seq { yield x; yield! to_seq comparer (tl comparer node) }
 
type 'a BinaryHeap(comparer : 'a -> 'a -> int, inner : 'a heap) =
    (* private *)
    member this.inner = inner
 
    (* public *)
    member this.hd = Heap.hd inner
    member this.tl = BinaryHeap(comparer, Heap.tl comparer inner)
    member this.merge (other : BinaryHeap<_>) = BinaryHeap(comparer, Heap.merge comparer (inner, other.inner))
    member this.insert x = BinaryHeap(comparer, Heap.merge comparer (inner,(HeapNode(1, x, EmptyHeap, EmptyHeap))))
    
    interface System.Collections.Generic.IEnumerable<'a> with
        member this.GetEnumerator() = (Heap.to_seq comparer inner).GetEnumerator()
            
    interface System.Collections.IEnumerable with
        member this.GetEnumerator() = (Heap.to_seq comparer inner :> System.Collections.IEnumerable).GetEnumerator()
 
    static member make(comparer) = BinaryHeap<_>(comparer, EmptyHeap)

This heap implements the IEnumerable<'a> interface, allowing us to iterate through it like a seq. In addition to the leftist heap shown above, its very easy to implement immutable versions of splay heaps, binomial heaps, Fibonacci heaps, pairing heaps, and a variety other tree-like data structures in F#.

Lazy Data Structures

Its worth noting that some purely functional data structures above are not as efficient as their imperative implementations. For example, appending two immutable stacks x and y together takes O(n) time, where n is the number of elements in stack x. However, we can exploit laziness in ways which make purely functional data structures just as efficient as their imperative counterparts.

For example, its easy to create a stack-like data structure which delays all computation until its really needed:

type 'a lazyStack =
    | Node of Lazy<'a * 'a lazyStack>
    | EmptyStack
 
module LazyStack =
    let (|Cons|Nil|) = function
        | Node(item) ->
            let hd, tl = item.Force()
            Cons(hd, tl)
        | EmptyStack -> Nil
 
    let hd = function
        | Cons(hd, tl) -> hd
        | Nil -> failwith "empty"
 
    let tl = function
        | Cons(hd, tl) -> tl
        | Nil -> failwith "empty"
 
    let cons(hd, tl) = Node(lazy(hd, tl))
    
    let empty = EmptyStack
 
    let rec append x y =
        match x with
        | Cons(hd, tl) -> Node(lazy(printfn "appending... got %A" hd; hd, append tl y))
        | Nil -> y
 
    let rec iter f = function
        | Cons(hd, tl) -> f(hd); iter f tl
        | Nil -> ()

In the example above, the append operation returns one node delays the rest of the computation, so appending two lists will occur in constant time. A printfn statement above has been added to demonstrate that we really don't compute appended values until the first time they're accessed:

> open LazyStack;;
> let x = cons(1, cons(2, cons(3, cons(4, EmptyStack))));;

val x : int lazyStack = Node <unevaluated>

> let y = cons(5, cons(6, cons(7, EmptyStack)));;

val y : int lazyStack = Node <unevaluated>

> let z = append x y;;

val z : int lazyStack = Node <unevaluated>

> hd z;;
appending... got 1
val it : int = 1

> hd (tl (tl z) );;
appending... got 2
appending... got 3
val it : int = 3

> iter (fun x -> printfn "%i" x) z;;
1
2
3
appending... got 4
4
5
6
7
val it : unit = ()

Interestingly, the append method clearly runs in O(1) time because the actual appending operation is delayed until a user grabs the head of the list. At the same time, grabbing the head of the list may have the side effect of triggering, at most, one call to the append method without causing a monolithic rebuilding the rest of the data structure, so grabbing the head is itself an O(1) operation. This stack implementation supports supports constant-time consing and appending, and linear time lookups.

Similarly, implementations of lazy queues exists which support O(1) worst-case behavior for all operations.

Additional Resources

Purely Functional Data Structures by Chris Okasaki ISBN 978-0521663502. Highly recommended. Provides techniques and analysis of immutable data structures using SML.
The Algorithm Design Manual by Steven S. Skiena ISBN 978-0387948607. Highly recommended. Provides language-agnostic description of a variety of algorithms, data structures, and techniques for solving hard problems in computer science.
Tutorial on immutable data structures using C#:

Previous: Active Patterns

Index

Next: Reflection