R Programming/Random Number Generation

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Random Number Generators[edit | edit source]

To a very high degree computers are deterministic and therefore are not a reliable source of significant amounts of random values. In general pseudo random number generators are used. The default algorithm in R is Mersenne-Twister but a long list of methods is available. See the help of RNGkind() to learn about random number generators.

?RNGkind

It is possible to use true random numbers. Some of them are collected on random.org (link). The random (link) package gives an access to them.

Randu[edit | edit source]

Randu is an old linear congruential pseudorandom number generator. There is a dataset generated with Randu in the datasets package. The function which is used to generate the dataset is in the help of this page.

library("datasets")
?randu

Seed[edit | edit source]

A pseudo random number generator is an algorithm based on a starting point called "seed". If you want to perform an exact replication of your program, you have to specify the seed using the function set.seed(). The argument of set.seed has to be an integer.

> set.seed(1)
> runif(1)
[1] 0.2655087
> set.seed(1)
> runif(1)
[1] 0.2655087

Sampling in a vector[edit | edit source]

Toss 10 coins

> sample(0:1,10,replace=T)
 [1] 1 0 0 0 1 0 0 1 1 1

Roll 10 dice

> sample(1:6,10,replace=T)
 [1] 4 1 5 3 2 5 5 6 3 2

play lottery (6 random numbers out of 49 without replacement)

> sample(1:49,6,replace=F)
[1] 18 35 29  1 33 11


You can sample in a multinomial distribution :

>mydat <- sample(1:4,1000,rep=TRUE,prob=c(.2,.3,.2,.3))
>table(mydat)

Sampling in a standard univariate distribution[edit | edit source]

You can use rnorm, rt, etc.

Misspecified argument[edit | edit source]

Note that if you put as argument of rnorm a vector instead of a number, R takes by default the length of the vector instead of returning an error. Here is an example :

x <- rnorm(10) # Sample a normal random vector
set.seed(1) # use the seed
z <- rnorm(x) # put a vector instead of a number as an argument of rnorm
set.seed(1) # initialize the seed again
z2 <- rnorm(length(x)) # sample in a vector with the same length as x
plot(z2,z) # check that z and z2 are the same

Inverse CDF method[edit | edit source]

  • If you know the inverse CDF (quantile function), you can generate the random variable by sampling in the standard uniform distribution and transforming using the CDF.

For instance, if you want to simulate from a standard normal distribution, you can simulate from a standard uniform and transform it using the quantile function of the normal distribution.

N <- 100
qnorm(runif(N))

This gives the same results as the rnorm() function but the computing time is higher :

> N <- 10^7
> system.time(qnorm(runif(N)))
   user  system elapsed 
   1.67    0.00    1.70 
> system.time(rnorm(N)) 
   user  system elapsed 
   1.50    0.00    1.51 

Importance sampling[edit | edit source]

Metropolis algorithm[edit | edit source]

Gibbs algorithm[edit | edit source]

Quasi random numbers[edit | edit source]

Sometimes you need to generate quasi random sequences. The randtoolbox library provides several quasi random number generators.

See also sHalton() and QUnif() (sfsmisc).

> library(randtoolbox)
> halton(10, dim = 2, init = TRUE, normal = FALSE, usetime = FALSE)
        [,1]       [,2]
 [1,] 0.5000 0.33333333
 [2,] 0.2500 0.66666667
 [3,] 0.7500 0.11111111
 [4,] 0.1250 0.44444444
 [5,] 0.6250 0.77777778
 [6,] 0.3750 0.22222222
 [7,] 0.8750 0.55555556
 [8,] 0.0625 0.88888889
 [9,] 0.5625 0.03703704
[10,] 0.3125 0.37037037

You can compare Halton draws with the standard R (pseudo) random number generator. Halton draws are much more systematic.

>random <- cbind(runif(1000),runif(1000))
>halton <- halton(1000, dim = 2, init = TRUE, normal = FALSE, usetime = FALSE)
>par(mfrow=c(2,2))
>plot(halton[,1],halton[,2])
>plot(random[,1],random[,2])

Examples[edit | edit source]

Resources[edit | edit source]

References[edit | edit source]


Previous: Probability Distributions Index Next: Control Structures