R Programming/Linear Models

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

Contents

[edit] Basics

First generate a fake dataset:

N <- 10
u <- rnorm(N)
x1 <- rnorm(N)
x2 <- rnorm(N)
y <- 1+ x1 + x2 + u

Estimate the model and store the result in the object called "fit":

fit<-lm(y~x1+x2)

Summarize the result using the summary() function:

> summary(fit)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5692 -0.2459  0.1852  0.3939  0.7577 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.2027     0.2998   4.012 0.005110 ** 
x1            0.8427     0.2785   3.026 0.019215 *  
x2            1.5147     0.2008   7.543 0.000132 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7377 on 7 degrees of freedom
Multiple R-Squared: 0.9103,	Adjusted R-squared: 0.8847 
F-statistic: 35.52 on 2 and 7 DF,  p-value: 0.0002161 

Get the estimated coefficients:

> fit$coeff
(Intercept)          x1          x2 
  1.2026522   0.8427403   1.5146775
> output <- summary(fit)
> coef(output) 
             Estimate Std. Error  t value    Pr(>|t|)
(Intercept) 1.1945847  0.2298888 5.196359 0.001258035
x1          0.6458170  0.3423214 1.886581 0.101182585
x2          0.6175165  0.2083628 2.963660 0.020995713

The vector of fitted values:

> fit$fitted
         1          2          3          4          5          6 
 2.7683005  5.6184918  1.5745784  3.4995633  1.4830613  3.5007273 
         7          8          9         10 
 0.8894371 -2.1445262  0.9895569  2.8405984 

The vector of residuals:

> fit$resid
          1           2           3           4           5 
 0.42612425  0.40416901  0.19942084 -0.09466376  0.17099960 
          6           7           8           9          10 
-0.29631716  0.75771491 -0.36131985  0.36311148 -1.56923931 

The number of degrees of freedom:

> fit$df
[1] 7

Information criteria:

> # Akaike Information Criteria
> AIC(fit)
[1] 26.72857
> # Bayesian Information Criteria
> AIC(fit,k=log(N))
[1] 27.93891

The stats4 package includes AIC and BIC function:

> library(stats4)
> ?BIC
> lm1 <- lm(Fertility ~ . , data = swiss)
> AIC(lm1)
[1] 326.0716
> BIC(lm1)
[1] 339.0226

[edit] Heteroskedasticity

[edit] Robustness

See Cook's distance

>library(car)
> cookd(fit)
           1            2            3            4            5 
0.0006205008 0.0643213760 0.2574810866 1.2128206779 0.2295047699 
           6            7            8            9           10 
0.3130578329 0.0003365221 0.0671830241 0.0048474954 0.0714255871

Influence plot:

>influence.plot(fit)

Leverage plots:

leverage.plot(fit,term.name=x1)
leverage.plot(fit,term.name=x2)

Bonferroni's outlier test:

> outlier.test(fit)

max|rstudent| = 2.907674, degrees of freedom = 6,
unadjusted p = 0.02706231, Bonferroni p = 0.2706231

Observation: 3 

[edit] Instrumental Variables

  • ivreg() in the AER package[1]

[edit] System of linear equations

[edit] References

  1. Christian Kleiber and Achim Zeileis (2008). Applied Econometrics with R. New York: Springer-Verlag. ISBN 978-0-387-77316-2. URL http://CRAN.R-project.org/package=AER


Previous: Descriptive Statistics Index Next: Maximum Likelihood