Statistical Analysis: an Introduction using R/R/A simple R session

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Even though R has not been fully introduced yet, it is instructive to see how simple a useful R session can be. As an example, we will fit a statistical model using the cars data from the previous topic, and see how to produce a similar plot to Figure 1.2b, with a straight best-fit line. This is a common task in many simple analyses.

Some of the commands in this example will be unfamiliar: don't worry, the main point is not to understand the commands, but to get an overall sense of how R works. Nevertheless, if you do want to understand the commands fully, you will need to know about data frames (essentially, tables of data with named columns) and model formulae (essentially, a notation of the form a ~ b + c, meaning a is predicted by b and c).

Nuvola-inspired-terminal.svg Input:
1 plot(dist ~ speed, data=cars)                     #A common way of creating a specific plot is via a model formula
2 straight.line.model <- lm(dist~speed, data=cars)  #This creates and stores a model ("lm" means "Linear Model").
3 abline(straight.line.model, col="red")            #"abline" will also plot a straight line from a model
4 straight.line.model                               #Show model predictions (estimated slope & intercept of the line)
Crystal Clear app kscreensaver.svg Result:
> plot(dist ~ speed, data=cars)                     #A common way of creating a specific plot is via a model formula
> straight.line.model <- lm(dist~speed, data=cars)  #This creates and stores a model ("lm" means "Linear Model").
> abline(straight.line.model, col="red")            #"abline" will also plot a straight line from a model
> straight.line.model                               #Show model predictions (estimated slope & intercept of the line)

Call:
lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept)        speed  
    -17.579        3.932  

Note that unlike the examples in the graphics topic, we have plotted the data by specifying a model formula, rather than just giving the name of the dataset. Although in this case the resulting plot is the same as seen with plot(cars), the formula interface makes it clearer what is being plotted.