Statistical Analysis: an Introduction using R/R/Vectors
From Wikibooks, open books for an open world
One of the most fundamental objects in R is the vector, used to store multiple measurements of the same type (e.g. data variables). There are several different sorts of data that can be stored in a vector. Most common is the numeric vector, in which each element of the vector is simply a number. Other commonly used types of vector are character vectors (where each element is a piece of text) and logical vectors (where each element is either
Note that the first part of your output may look slightly different to that above. Depending on the width of your screen, the number of elements printed on each line of output may differ. This is the reason for the numbers in square brackets, which are produced when vectors are printed to the screen. These bracketed numbers give the position of the first element on that line, which is a useful visual aid. For instance, looking at the printout of state.name, and counting across from the second line, we can tell that the eighth state is Delaware.
TRUE
or FALSE
^{[1]}). In this topic we will use some example vectors provided by the "datasets" package, containing data on States of the USA (see ?state
).
R is an inherently vectorbased program; in fact the numbers we have been using in previous calculations are just treated as vectors with a single element. This means that most basic functions in R will behave sensibly when given a vector as a argument, as shown below.
Input:

state.area #a NUMERIC vector giving the area of US states, in square miles

state.name #a CHARACTER vector (note the quote marks) of state names

sq.km < state.area*2.59 #Arithmetic works on numeric vectors, e.g. convert sq miles to sq km

sq.km #... the new vector has the calculation applied to each element in turn

sqrt(sq.km) #Many mathematical functions also apply to each element in turn

range(state.area) #But some functions return different length vectors (here, just the max & min).

length(state.area) #and some, like this useful one, just return a single value.
Result:
> state.area #a NUMERIC vector giving the area of US states, in square miles
[1] 51609 589757 113909 53104 158693 104247 5009 2057 58560 58876 6450 83557 56400
[14] 36291 56290 82264 40395 48523 33215 10577 8257 58216 84068 47716 69686 147138 [27] 77227 110540 9304 7836 121666 49576 52586 70665 41222 69919 96981 45333 1214 [40] 31055 77047 42244 267339 84916 9609 40815 68192 24181 56154 97914 > state.name #a CHARACTER vector (note the quote marks) of state names
[1] "Alabama" "Alaska" "Arizona" "Arkansas" [5] "California" "Colorado" "Connecticut" "Delaware" [9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas" [17] "Kentucky" "Louisiana" "Maine" "Maryland" [21] "Massachusetts" "Michigan" "Minnesota" "Mississippi" [25] "Missouri" "Montana" "Nebraska" "Nevada" [29] "New Hampshire" "New Jersey" "New Mexico" "New York" [33] "North Carolina" "North Dakota" "Ohio" "Oklahoma" [37] "Oregon" "Pennsylvania" "The smallest state" "South Carolina" [41] "South Dakota" "Tennessee" "Texas" "Utah" [45] "Vermont" "Virginia" "Washington" "West Virginia" [49] "Wisconsin" "Wyoming" > sq.km < state.area*2.59 #Standard arithmatic works on numeric vectors, e.g. convert sq miles to sq km > sq.km #... giving another vector with the calculation performed on each element in turn
[1] 133667.31 1527470.63 295024.31 137539.36 411014.87 269999.73 12973.31 5327.63 [9] 151670.40 152488.84 16705.50 216412.63 146076.00 93993.69 145791.10 213063.76
[17] 104623.05 125674.57 86026.85 27394.43 21385.63 150779.44 217736.12 123584.44 [25] 180486.74 381087.42 200017.93 286298.60 24097.36 20295.24 315114.94 128401.84 [33] 136197.74 183022.35 106764.98 181090.21 251180.79 117412.47 3144.26 80432.45 [41] 199551.73 109411.96 692408.01 219932.44 24887.31 105710.85 176617.28 62628.79 [49] 145438.86 253597.26 > sqrt(sq.km) #Many mathematical functions also apply to each element in turn
[1] 365.60540 1235.90883 543.16140 370.86299 641.10441 519.61498 113.90044 72.99062 [9] 389.44884 390.49819 129.24976 465.20171 382.19890 306.58390 381.82601 461.58830
[17] 323.45487 354.50609 293.30334 165.51263 146.23826 388.30328 466.62203 351.54579 [25] 424.83731 617.32278 447.23364 535.06878 155.23324 142.46136 561.35100 358.33202 [33] 369.04978 427.81111 326.74911 425.54695 501.17940 342.65503 56.07370 283.60615 [41] 446.71213 330.77479 832.11058 468.96955 157.75712 325.13205 420.25859 250.25745 [49] 381.36447 503.58441 > range(state.area) #But some functions return different length vectors (here, just the max & min). [1] 1214 589757 > length(state.area) #and some, like this useful one, just return a single value. [1] 50
You may occasionally need to create your own vectors from scratch (although most vectors are obtained from processing data in alreadyexisting files). The most commonly used function for constructing vectors is
c()
, so named because it concatenates objects together. However, if you wish to create vectors consisting of regular sequences of numbers (e.g. 2,4,6,8,10,12, or 1,1,2,2,1,1,2,2) there are several alternative functions you can use, including seq()
, rep()
, and the :
operator.
Input:

c("one", "two", "three", "pi") #Make a character vector

c(1,2,3,pi) #Make a numeric vector

seq(1,3) #Create a sequence of numbers

1:3 #A shortcut for the same thing (but less flexible)

i < 1:3 #You can store a vector

i

i < c(i,pi) #To add more elements, you must assign again, e.g. using c()

i

i < c(i, "text") #A vector cannot contain different data types, so ...

i #... R converts all elements to the same type

i+1 #The numbers are now strings of text: arithmetic is impossible

rep(1, 10) #The "rep" function repeats its first argument

rep(3:1,10) #The first argument can also be a vector

huge.vector < 0:(10^7) #R can easily cope with very big vectors

#huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers

rm(huge.vector) #"rm" removes objects. Deleting huge unused objects is sensible
Result:
> c("one", "two", "three", "pi") #Make a character vector
[1] "one" "two" "three" "pi" > c(1,2,3,pi) #Make a numeric vector [1] 1.000000 2.000000 3.000000 3.141593 > seq(1,3) #Create a sequence of numbers [1] 1 2 3 > 1:3 #A shortcut for the same thing (but less flexible) [1] 1 2 3 > i < 1:3 #You can store a vector > i [1] 1 2 3 > i < c(i,pi) #To add more elements, you must assign again, e.g. using c() > i [1] 1.000000 2.000000 3.000000 3.141593 > i < c(i, "text") #A vector cannot contain different data types, so ... > i #... R converts all elements to the same type [1] "1" "2" "3" "3.14159265358979" "text" > i+1 #The numbers are now strings of text: arithmetic is impossible Error in i + 1 : nonnumeric argument to binary operator > rep(1, 10) #The "rep" function repeats its first argument
[1] 1 1 1 1 1 1 1 1 1 1
> rep(3:1,10) #The first argument can also be a vector
[1] 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1
> huge.vector < 0:(10^7) #R can easily cope with very big vectors > #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers > rm(huge.vector) #"rm" removes objects. Deleting huge unused objects is sensible
Notes[edit]
 ↑ These are special words in R, and cannot be used as names for objects. The objects
T
andF
are temporary shortcuts forTRUE
andFALSE
, but if you use them, watch out: since T and F are just normal object names you can change their meaning by overwriting them.