Statistical Analysis: an Introduction using R/R/Vectors

From Wikibooks, open books for an open world
Jump to: navigation, search
One of the most fundamental objects in R is the vector, used to store multiple measurements of the same type (e.g. data variables). There are several different sorts of data that can be stored in a vector. Most common is the numeric vector, in which each element of the vector is simply a number. Other commonly used types of vector are character vectors (where each element is a piece of text) and logical vectors (where each element is either TRUE or FALSE[1]). In this topic we will use some example vectors provided by the "datasets" package, containing data on States of the USA (see ?state).

R is an inherently vector-based program; in fact the numbers we have been using in previous calculations are just treated as vectors with a single element. This means that most basic functions in R will behave sensibly when given a vector as a argument, as shown below.

Crystal Clear app terminal.png Input:
1 state.area                #a NUMERIC vector giving the area of US states, in square miles
2 state.name                #a CHARACTER vector (note the quote marks) of state names 
3 sq.km <- state.area*2.59  #Arithmetic works on numeric vectors, e.g. convert sq miles to sq km
4 sq.km                     #... the new vector has the calculation applied to each element in turn
5 sqrt(sq.km)               #Many mathematical functions also apply to each element in turn 
6 range(state.area)         #But some functions return different length vectors (here, just the max & min).
7 length(state.area)        #and some, like this useful one, just return a single value.
Crystal Clear app kscreensaver.png Result:
> state.area                #a NUMERIC vector giving the area of US states, in square miles
 [1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876   6450  83557  56400
[14]  36291  56290  82264  40395  48523  33215  10577   8257  58216  84068  47716  69686 147138
[27]  77227 110540   9304   7836 121666  49576  52586  70665  41222  69919  96981  45333   1214
[40]  31055  77047  42244 267339  84916   9609  40815  68192  24181  56154  97914
> state.name                #a CHARACTER vector (note the quote marks) of state names 
 [1] "Alabama"            "Alaska"             "Arizona"            "Arkansas"          
 [5] "California"         "Colorado"           "Connecticut"        "Delaware"          
 [9] "Florida"            "Georgia"            "Hawaii"             "Idaho"             
[13] "Illinois"           "Indiana"            "Iowa"               "Kansas"            
[17] "Kentucky"           "Louisiana"          "Maine"              "Maryland"          
[21] "Massachusetts"      "Michigan"           "Minnesota"          "Mississippi"       
[25] "Missouri"           "Montana"            "Nebraska"           "Nevada"            
[29] "New Hampshire"      "New Jersey"         "New Mexico"         "New York"          
[33] "North Carolina"     "North Dakota"       "Ohio"               "Oklahoma"          
[37] "Oregon"             "Pennsylvania"       "The smallest state" "South Carolina"    
[41] "South Dakota"       "Tennessee"          "Texas"              "Utah"              
[45] "Vermont"            "Virginia"           "Washington"         "West Virginia"     
[49] "Wisconsin"          "Wyoming"           
> sq.km <- state.area*2.59  #Standard arithmatic works on numeric vectors, e.g. convert sq miles to sq km
> sq.km                     #... giving another vector with the calculation performed on each element in turn
 [1]  133667.31 1527470.63  295024.31  137539.36  411014.87  269999.73   12973.31    5327.63
 [9]  151670.40  152488.84   16705.50  216412.63  146076.00   93993.69  145791.10  213063.76
[17]  104623.05  125674.57   86026.85   27394.43   21385.63  150779.44  217736.12  123584.44
[25]  180486.74  381087.42  200017.93  286298.60   24097.36   20295.24  315114.94  128401.84
[33]  136197.74  183022.35  106764.98  181090.21  251180.79  117412.47    3144.26   80432.45
[41]  199551.73  109411.96  692408.01  219932.44   24887.31  105710.85  176617.28   62628.79
[49]  145438.86  253597.26
> sqrt(sq.km)               #Many mathematical functions also apply to each element in turn 
 [1]  365.60540 1235.90883  543.16140  370.86299  641.10441  519.61498  113.90044   72.99062
 [9]  389.44884  390.49819  129.24976  465.20171  382.19890  306.58390  381.82601  461.58830
[17]  323.45487  354.50609  293.30334  165.51263  146.23826  388.30328  466.62203  351.54579
[25]  424.83731  617.32278  447.23364  535.06878  155.23324  142.46136  561.35100  358.33202
[33]  369.04978  427.81111  326.74911  425.54695  501.17940  342.65503   56.07370  283.60615
[41]  446.71213  330.77479  832.11058  468.96955  157.75712  325.13205  420.25859  250.25745
[49]  381.36447  503.58441
> range(state.area)         #But some functions return different length vectors (here, just the max & min).
[1]   1214 589757
> length(state.area)        #and some, like this useful one, just return a single value.
[1] 50
Note that the first part of your output may look slightly different to that above. Depending on the width of your screen, the number of elements printed on each line of output may differ. This is the reason for the numbers in square brackets, which are produced when vectors are printed to the screen. These bracketed numbers give the position of the first element on that line, which is a useful visual aid. For instance, looking at the printout of state.name, and counting across from the second line, we can tell that the eighth state is Delaware.
You may occasionally need to create your own vectors from scratch (although most vectors are obtained from processing data in already-existing files). The most commonly used function for constructing vectors is c(), so named because it concatenates objects together. However, if you wish to create vectors consisting of regular sequences of numbers (e.g. 2,4,6,8,10,12, or 1,1,2,2,1,1,2,2) there are several alternative functions you can use, including seq(), rep(), and the : operator.
Crystal Clear app terminal.png Input:
 1 c("one", "two", "three", "pi")  #Make a character vector
 2 c(1,2,3,pi)                     #Make a numeric vector
 3 seq(1,3)                        #Create a sequence of numbers
 4 1:3                             #A shortcut for the same thing (but less flexible)
 5 i <- 1:3                        #You can store a vector
 6 i
 7 i <- c(i,pi)                    #To add more elements, you must assign again, e.g. using c() 
 8 i                             
 9 i <- c(i, "text")               #A vector cannot contain different data types, so ... 
10 i                               #... R converts all elements to the same type
11 i+1                             #The numbers are now strings of text: arithmetic is impossible 
12 rep(1, 10)                      #The "rep" function repeats its first argument
13 rep(3:1,10)                     #The first argument can also be a vector
14 huge.vector <- 0:(10^7)         #R can easily cope with very big vectors
15 #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers
16 rm(huge.vector)                 #"rm" removes objects. Deleting huge unused objects is sensible
Crystal Clear app kscreensaver.png Result:
> c("one", "two", "three", "pi")  #Make a character vector
[1] "one"   "two"   "three" "pi"   
> c(1,2,3,pi)                     #Make a numeric vector
[1] 1.000000 2.000000 3.000000 3.141593
> seq(1,3)                        #Create a sequence of numbers
[1] 1 2 3
> 1:3                             #A shortcut for the same thing (but less flexible)
[1] 1 2 3
> i <- 1:3                        #You can store a vector
> i
[1] 1 2 3
> i <- c(i,pi)                    #To add more elements, you must assign again, e.g. using c() 
> i                             
[1] 1.000000 2.000000 3.000000 3.141593
> i <- c(i, "text")               #A vector cannot contain different data types, so ... 
> i                               #... R converts all elements to the same type
[1] "1"                "2"                "3"                "3.14159265358979" "text"            
> i+1                             #The numbers are now strings of text: arithmetic is impossible 
Error in i + 1 : non-numeric argument to binary operator
> rep(1, 10)                      #The "rep" function repeats its first argument
 [1] 1 1 1 1 1 1 1 1 1 1
> rep(3:1,10)                     #The first argument can also be a vector
 [1] 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1
> huge.vector <- 0:(10^7)         #R can easily cope with very big vectors
> #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers
> rm(huge.vector)                 #"rm" removes objects. Deleting huge unused objects is sensible


Notes[edit]

  1. These are special words in R, and cannot be used as names for objects. The objects T and F are temporary shortcuts for TRUE and FALSE, but if you use them, watch out: since T and F are just normal object names you can change their meaning by overwriting them.