F Sharp Programming/Caching

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Previous: Units of Measure Index Next: Active Patterns
F# : Caching

Caching is often useful to re-use data which has already been computed. F# provides a number of built-in techniques to cache data for future use.

Partial Functions[edit | edit source]

F# automatically caches the value of any function which takes no parameters. When F# comes across a function with no parameters, F# will only evaluate the function once and reuse its value everytime the function is accessed. Compare the following:

let isNebraskaCity_bad city =
    let cities =
        printfn "Creating cities Set"
        ["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
        |> Set.ofList
        
    cities.Contains(city)
    
let isNebraskaCity_good =
    let cities =
        printfn "Creating cities Set"
        ["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
        |> Set.ofList
        
    fun city -> cities.Contains(city)

Both functions accept and return the same values, but they have very different behavior. Here's a comparison of the output in fsi:

isNebraskaCity_bad isNebraskaCity_good
> let isNebraskaCity_bad city =
    let cities =
        printfn "Creating cities Set"
        ["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
        |> Set.of_list
        
    cities.Contains(city);;

val isNebraskaCity_bad : string -> bool

> isNebraskaCity_bad "Lincoln";;
Creating cities Set
val it : bool = true

> isNebraskaCity_bad "Washington";;
Creating cities Set
val it : bool = false

> isNebraskaCity_bad "Bellevue";;
Creating cities Set
val it : bool = true

> isNebraskaCity_bad "St. Paul";;
Creating cities Set
val it : bool = false
> let isNebraskaCity_good =
    let cities =
        printfn "Creating cities Set"
        ["Bellevue"; "Omaha"; "Lincoln"; "Papillion"]
        |> Set.of_list
        
    fun city -> cities.Contains(city);;
Creating cities Set

val isNebraskaCity_good : (string -> bool)

> isNebraskaCity_good "Lincoln";;
val it : bool = true

> isNebraskaCity_good "Washington";;
val it : bool = false

> isNebraskaCity_good "Bellevue";;
val it : bool = true

> isNebraskaCity_good "St. Paul";;
val it : bool = false

The implementation of isNebraskaCity_bad forces F# to re-create the internal set on each call. On the other hand, isNebraskaCity_good is a value initialized to the function fun city -> cities.Contains(city), so it creates its internal set once and reuses it for all successive calls.

Note: Internally, isNebraskaCity_bad is compiled as a static function which constructs a set on every call. isNebraskaCity_good is compiled as a static readonly property, where the value is initialized in a static constructor.

This distinction is often subtle, but it can have a huge impact on an application's performance.

Memoization[edit | edit source]

"Memoization" is a fancy word meaning that computed values are stored in a lookup table rather than recomputed on every successive call. Long-running pure functions (i.e. functions which have no side-effects) are good candidates for memoization. Consider the recursive definition for computing Fibonacci numbers:

> #time;;

--> Timing now on

> let rec fib n =
    if n = 0I then 0I
    elif n = 1I then 1I
    else fib (n - 1I) + fib(n - 2I);;
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0

val fib : Math.bigint -> Math.bigint

> fib 35I;;
Real: 00:00:23.557, CPU: 00:00:23.515, GC gen0: 2877, gen1: 3, gen2: 0
val it : Math.bigint = 9227465I

Beyond fib 35I, the runtime of the function becomes unbearable. Each recursive call to the fib function throws away all of its intermediate calculations to fib(n - 1I) and fib(n - 2I), giving it a runtime complexity of about O(2^n). What if we kept all of those intermediate calculations around in a lookup table? Here's the memoized version of the fib function:

> #time;;

--> Timing now on

> let rec fib =
    let dict = new System.Collections.Generic.Dictionary<_,_>()
    fun n ->
        match dict.TryGetValue(n) with
        | true, v -> v
        | false, _ -> 
            let temp =
                if n = 0I then 0I
                elif n = 1I then 1I
                else fib (n - 1I) + fib(n - 2I)
            dict.Add(n, temp)
            temp;;

val fib : (Math.bigint -> Math.bigint)

> fib 35I;;
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val it : Math.bigint = 9227465I

Much better! This version of the fib function runs almost instaneously. In fact, since we only calculate the value of any fib(n) precisely once, and dictionary lookups are an O(1) operation, this fib function runs in O(n) time.

Notice all of the memoization logic is contained in the fib function. We can write a more general function to memoize any function:

let memoize f =
    let dict = new System.Collections.Generic.Dictionary<_,_>()
    fun n ->
        match dict.TryGetValue(n) with
        | (true, v) -> v
        | _ ->
            let temp = f(n)
            dict.Add(n, temp)
            temp
            
let rec fib = memoize(fun n ->
    if n = 0I then 0I
    elif n = 1I then 1I
    else fib (n - 1I) + fib (n - 2I) )
Note: Its very important to remember that the implementation above is not thread-safe -- the dictionary should be locked before adding/retrieving items if it will be accessed by multiple threads.
Additionally, although dictionary lookups occur in constant time, the hash function used by the dictionary can take an arbitrarily long time to execute (this is especially true with strings, where the time it takes to hash a string is proportional to its length). For this reason, it is wholly possible for a memoized function to have less performance than an unmemorized function. Always profile code to determine whether optimization is necessary and whether memoization genuinely improves performance.

Lazy Values[edit | edit source]

The F# lazy data type is an interesting primitive which delays evaluation of a value until the value is actually needed. Once computed, lazy values are cached for reuse later:

> let x = lazy(printfn "I'm lazy"; 5 + 5);;

val x : Lazy<int> = <unevaluated>

> x.Force();; (* Should print "I'm lazy" *)
I'm lazy
val it : int = 10

> x.Force();; (* Value already computed, should not print "I'm lazy" again *)
val it : int = 10

F# uses some compiler magic to avoid evaluating the expression (printfn "I'm lazy"; 5 + 5) on declaration. Lazy values are probably the simplest form of caching, however they can be used to create some interesting and sophisticated data structures. For example, two F# data structures are implemented on top of lazy values, namely the F# Lazy List and Seq.cache method.

Lazy lists and cached sequences represent arbitrary sequences of potentially infinite numbers of elements. The elements are computed and cached the first time they are accessed, but will not be recomputed when the sequence is enumerated again. Here's a demonstration in fsi:

> let x = seq { for a in 1 .. 10 -> printfn "Got %i" a; a } |> Seq.cache;;

val x : seq<int>

> let y = Seq.take 5 x;;

val y : seq<int>

> Seq.reduce (+) y;;
Got 1
Got 2
Got 3
Got 4
Got 5
val it : int = 15

> Seq.reduce (+) y;; (* Should not recompute values *)
val it : int = 15

> Seq.reduce (+) x;; (* Values 1 through 5 already computed, should only compute 6 through 10 *)
Got 6
Got 7
Got 8
Got 9
Got 10
val it : int = 55
Previous: Units of Measure Index Next: Active Patterns