Haskell/Hierarchical libraries/Randoms

From Wikibooks, open books for an open world
< Haskell‎ | Hierarchical libraries
Jump to: navigation, search

Random examples[edit]

Here are a handful of uses of random numbers by example

Example: Ten random integer numbers

import System.Random
main = do
   gen <- newStdGen
   let ns = randoms gen :: [Int]
   print $ take 10 ns

The IO action newStdGen, as its name implies, creates a new StdGen pseudorandom number generator state. This StdGen can be passed to functions which need to generate pseudorandom numbers.

(There also exists a global random number generator which is initialized automatically in a system dependent fashion. This generator is maintained in the IO monad and can be accessed with getStdGen. This is perhaps a library wart, however, as all that you really ever need is newStdGen.)

Alternatively one can get a generator by initializing it with an integer, using mkStdGen:

Example: Ten random floats using mkStdGen

import System.Random

randomList :: (Random a) => Int -> [a]
randomList seed = randoms (mkStdGen seed)

main :: IO ()
main = do print $ take 10 (randomList 42 :: [Float])

Running this script results in output like this:


Example: Unsorting a list (imperfectly)

import Data.List ( sortBy )
import Data.Ord ( comparing )
import System.Random ( Random, RandomGen, randoms, newStdGen )

main :: IO ()
main =
 do gen <- newStdGen
    interact $ unlines . unsort gen . lines

unsort :: (RandomGen g) => g -> [x] -> [x]
unsort g es = map snd . sortBy (comparing fst) $ zip rs es
  where rs = randoms g :: [Integer]

There's more to random number generation than randoms. You can, for example, use random (sans 's') to generate a single random number along with a new StdGen to be used for the next one, as well as randomR and randomRs which take a parameter for specifying the range. See below for more ideas.

The Standard Random Number Generator[edit]

The Haskell standard random number functions and types are defined in the Random module (or System.Random if you use hierarchical modules). The definition is at http://www.haskell.org/onlinereport/random.html, but its a bit tricky to follow because it uses classes to make itself more general.

From the standard:

---------------- The RandomGen class ------------------------

class RandomGen g where
  genRange :: g -> (Int, Int)
  next     :: g -> (Int, g)
  split    :: g -> (g, g)

---------------- A standard instance of RandomGen -----------
data StdGen = ... -- Abstract

This basically introduces StdGen, the standard random number generator "object". It's an instance of the RandomGen class which specifies the operations other pseudorandom number generator libraries need to implement to be used with the System.Random library.

If you have r :: StdGen then you can say:

  (x, r2) = next r

This gives you a random Int x and a new StdGen r2. The 'next' function is defined in the RandomGen class, and you can apply it to something of type StdGen because StdGen is an instance of the RandomGen class, as below.

From the Standard:

instance RandomGen StdGen where ...
instance Read      StdGen where ...
instance Show      StdGen where ...

This also says that you can convert a StdGen to and from a string, which is there as a handy way to save the state of the generator. (The dots are not Haskell syntax. They simply say that the Standard does not define an implementation of these instances.)

From the Standard:

mkStdGen :: Int -> StdGen

This is the factory function for StdGen objects. Put in a seed, get out a generator.

The reason that the 'next' function also returns a new random number generator is that Haskell is a functional language, so no side effects are allowed. In most languages the random number generator routine has the hidden side effect of updating the state of the generator ready for the next call. Haskell can't do that. So if you want to generate three random numbers you need to say something like

     (x1, r2) = next r
     (x2, r3) = next r2
     (x3, r4) = next r3

The other thing is that the random values (x1, x2, x3) are random integers. To get something in the range, say, (0,999) you would have to take the modulus yourself, which is silly. There ought to be a library routine built on this, and indeed there is.

From the Standard:

---------------- The Random class ---------------------------
class Random a where
   randomR :: RandomGen g => (a, a) -> g -> (a, g)
   random  :: RandomGen g => g -> (a, g)

   randomRs :: RandomGen g => (a, a) -> g -> [a]
   randoms  :: RandomGen g => g -> [a]

   randomRIO :: (a,a) -> IO a
   randomIO  :: IO a

Remember that StdGen is the only instance of type RandomGen (unless you roll your own random number generator). So you can substitute StdGen for 'g' in the types above and get this:

  randomR :: (a, a) -> StdGen -> (a, StdGen)
  random  :: StdGen -> (a, StdGen)
  randomRs :: (a, a) -> StdGen -> [a]
  randoms  :: StdGen -> [a]

But remember that this is all inside *another* class declaration "Random". So what this says is that any instance of Random can use these functions. The instances of Random in the Standard are:

instance Random Integer where ...
instance Random Float   where ...
instance Random Double  where ...
instance Random Bool    where ...
instance Random Char    where ...

So for any of these types you can get a random range. You can get a random integer with:

  (x1, r2) = randomR (0,999) r

And you can get a random upper case character with

  (c2, r3) = randomR ('A', 'Z') r2

You can even get a random bit with

  (b3, r4) = randomR (False, True) r3

So far so good, but threading the random number state through your entire program like this is painful, error prone, and generally destroys the nice clean functional properties of your program.

One partial solution is the "split" function in the RandomGen class. It takes one generator and gives you two generators back. This lets you say something like this:

  (r1, r2) = split r
  x = foo r1

In this case we are passing r1 down into function foo, which does something random with it and returns a result "x", and we can then take "r2" as the random number generator for whatever comes next. Without "split" we would have to write

  (x, r2) = foo r1

But even this is often too clumsy, so you can do it the quick and dirty way by putting the whole thing in the IO monad. This gives you a standard global random number generator just like any other language. But because its just like any other language it has to do it in the IO monad.

From the Standard:

---------------- The global random generator ---------------- 
newStdGen    :: IO StdGen 
setStdGen    :: StdGen -> IO ()
getStdGen    :: IO StdGen
getStdRandom :: (StdGen -> (a, StdGen)) -> IO a

So you could write:

  foo :: IO Int
  foo = do
     r1 <- getStdGen
     let (x, r2) = randomR (0,999) r1
     setStdGen r2
     return x

This gets the global generator, uses it, and then updates it (otherwise every random number will be the same). But having to get and update the global generator every time you use it is a pain, so its more common to use getStdRandom. The argument to this is a function. Compare the type of that function to that of 'random' and 'randomR'. They both fit in rather well. To get a random integer in the IO monad you can say:

  x <- getStdRandom $ randomR (1,999)

The 'randomR (1,999)' has type "StdGen -> (Int, StdGen)", so it fits straight into the argument required by getStdRandom.

Using QuickCheck to Generate Random Data[edit]

Only being able to do random numbers in a nice straightforward way inside the IO monad is a bit of a pain. You find that some function deep inside your code needs a random number, and suddenly you have to rewrite half your program as IO actions instead of nice pure functions, or else have an StdGen parameter tramp its way down there through all the higher level functions. Something a bit purer is needed.

If you have read anything about Monads then you might have recognized the pattern I gave above:

     (x1, r2) = next r
     (x2, r3) = next r2
     (x3, r4) = next r3

The job of a monad is to abstract out this pattern, leaving the programmer to write something like:

  do -- Not real Haskell
     x1 <- random
     x2 <- random
     x3 <- random

Of course you can do this in the IO monad, but it would be better if random numbers had their own little monad that specialised in random computations. And it just so happens that such a monad exists. It lives in the Test.QuickCheck library, and it's called "Gen". And it does lots of very useful things with random numbers.

The reason that "Gen" lives in Test.QuickCheck is historical: that is where it was invented. The purpose of QuickCheck is to generate random unit tests to verify properties of your code. (Incidentally its very good at this, and most Haskell developers use it for testing). See the QuickCheck on HaskellWiki for more details. This tutorial will concentrate on using the "Gen" monad for generating random data.

Most Haskell compilers (including GHC) bundle QuickCheck in with their standard libraries, so you probably won't need to install it separately. Just say

import Test.QuickCheck

in your source file.

The "Gen" monad can be thought of as a monad of random computations. As well as generating random numbers it provides a library of functions that build up complicated values out of simple ones.

So lets start with a routine to return three random integers between 0 and 999:

randomTriple :: Gen (Integer, Integer, Integer)
randomTriple = do
   x1 <- choose (0,999)
   x2 <- choose (0,999)
   x3 <- choose (0,999)
   return (x1, x2, x3)

"choose" is one of the functions from QuickCheck. Its the equivalent to randomR. The type of "choose" is

choose :: Random a => (a, a) -> Gen a

In other words, for any type "a" which is an instance of "Random" (see above), "choose" will map a range into a generator.

Once you have a "Gen" action you have to execute it. The "unGen" function executes an action and returns the random result. The type is:

unGen :: Gen a -> StdGen -> Int -> a

The three arguments are:

  1. The generator action.
  2. A random number generator.
  3. The "size" of the result. This isn't used in the example above, but if you were generating a data structure with a variable number of elements (like a list) then this parameter lets you pass some notion of the expected size into the generator. We'll see an example later.

So for example:

   triple = unGen randomTriple (mkStdGen 1) 1

will generate three arbitrary numbers. But note that because the same seed value is used the numbers will always be the same (which is why I said "arbitrary", not "random"). If you want different numbers then you have to use a different StdGen argument.

A common pattern in most programming languages is to use a random number generator to choose between two courses of action:

-- Not Haskell code
r := random (0,1)
if r == 1 then foo else bar

QuickCheck provides a more declarative way of doing the same thing. If "foo" and "bar" are both generators returning the same type then you can say:

oneof [foo, bar]

This has an equal chance of returning either "foo" or "bar". If you wanted different odds, say that there was a 30% chance of "foo" and a 70% chance of "bar" then you could say

frequency [ (30, foo), (70, bar) ]

"oneof" takes a simple list of Gen actions and selects one of them at random. "frequency" does something similar, but the probability of each item is given by the associated weighting.

oneof :: [Gen a] -> Gen a
frequency :: [(Int, Gen a)] -> Gen a