R Programming/Maximum Likelihood

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Introduction[edit | edit source]

Maximum likelihood estimation is just an optimization problem. You have to write down your log likelihood function and use some optimization technique. Sometimes you also need to write your score (the first derivative of the log likelihood) and or the hessian (the second derivative of the log likelihood).

One dimension[edit | edit source]

If there is only one parameter, we can optimize the log likelihood using optimize().

Example with a type 1 Pareto distribution[edit | edit source]

We provide an example with a type 1 Pareto distribution. Note that in this example we treat the minimum as known and do not estimate it. Therefore this is a one-dimensional problem.

We use the rpareto1() (actuar) function to generate a random vector from a type 1 Pareto distribution with shape equal to 1 and minimum value equal to 500. We use the dpareto1() (actuar) function with option log = TRUE to write the log likelihood. Then we just need to use optimize() with maximum=TRUE. We provide a minimum and a maximum value for the parameter with the interval option.

> library(actuar)
> y <- rpareto1(1000, shape = 1, min = 500)
> ll <- function(mu, x) { 
+    sum(dpareto1(x,mu[1],min = min(x),log = TRUE)) 
+   } 
> optimize(f = ll, x = y, interval = c(0,10), maximum = TRUE)

Multiple dimension[edit | edit source]

  • fitdistr() (MASS package) fits univariate distributions by maximum likelihood. It is a wrapper for optim().
  • If you need to program yourself your maximum likelihood estimator (MLE) you have to use a built-in optimizer such as nlm(), optim(). R also includes the following optimizers :
  • mle() in the stats4 package
  • The maxLik package


Example with a logistic distribution[edit | edit source]

For instance, we draw from a logistic distribution and we estimate the parameters using .

> # draw from a gumbel distribution using the inverse cdf simulation method
> e.1 <- -log(-log(runif(10000,0,1))) 
> e.2 <- -log(-log(runif(10000,0,1)))
> u <- e.2 - e.1  # u follows a logistic distribution (difference between two gumbels.)
> fitdistr(u,densfun=dlogis,start=list(location=0,scale=1))

Example with a Cauchy distribution[edit | edit source]

For instance, we can write a simple maximum likelihood estimator for a Cauchy distribution using the nlm() optimizer. We first draw a vector x from a Cauchy distribution. Then we define the log likelihood function and then we optimize using the nlm() function. Note that nlm() is minimizer and not a maximizer.

> n <- 100
> x <- rcauchy(n)
> mlog.1 <- function(mu, x) { 
+   - sum(dcauchy(x, location = mu, log = TRUE)) 
+   } 
> mu.start <- median(x)
> out <- nlm(mlog.1, mu.start, x = x)


Example with a beta distribution[edit | edit source]

Here is an other example with the Beta distribution and the optim() function.

> y <- rbeta(1000,2,2)
> loglik <- function(mu, x) { 
+    sum(-dbeta(x,mu[1],mu[2],log = TRUE)) 
+    } 
> 
> out <- optim(par = c(1,1), fn=loglik,x=y,method = "L-BFGS-B",lower=c(0,0))

Tests[edit | edit source]

Likelihood Ratio Test[edit | edit source]

  • lrtest() in the lmtest package[1].


Some Specific cases[edit | edit source]

  • gum.fit() (ismev package) provides MLE for a Gumbel distributon


Resources[edit | edit source]

References[edit | edit source]

  1. Achim Zeileis, Torsten Hothorn (2002). Diagnostic Checking in Regression Relationships. R News 2(3), 7-10. URL http://CRAN.R-project.org/doc/Rnews/


Previous: Linear Models Index Next: Bayesian Methods