Statistical Analysis: an Introduction using R/R/Accessing elements of vectors

From Wikibooks, open books for an open world
Jump to navigation Jump to search
It is common to want to access certain elements of a vector: for example, we might want to use only the 10th element, or the first 4 elements, or select elements depending on their value. The way to do this is to take the vector and prepend the indexing operator, [] (i.e. square brackets). If these square brackets contain
  • A positive number or numbers, then this has the effect of picking those particular elements of the vector
  • A negative number or numbers, then this has the effect of picking the whole vector except those elements
  • A logical vector, then each element of the logical vector indicates whether to pick (if TRUE) or not (if FALSE) the equivalent element of the original vector[1].
The use of logical vectors may seem a little complicated. However, they can be extremely useful, because they are the key behind using comparison operators. These can be used, for example, to identify which US states are small, with an area less than (<) 10 000 square miles (as demonstrated below).
Input:
min(state.area)    #This gives the area of the smallest US state...
which.min(state.area) #... this shows which element it is (the 39th as it happens) 
state.name[39]     #You can obtain individual elements by using square brackets
state.name[39] <- "THE SMALLEST STATE" #You can replace elements using [] too
state.name       #The 39th name ("Rhode Island") should now have been changed
state.name[1:10]    #This returns a new vector consisting of only the first 10 states
state.name[-(1:10)]  #Using negative numbers gives everything but the first 10 states
state.name[c(1,2,2,1)] #You can also obtain the same element multiple times
###Logical vectors are a little more complicated to get your head round
state.area < 10000       #A LOGICAL vector, identifying which states are small
state.name[state.area < 10000] #So this can be used to select the names of the small states
Result:
> min(state.area) #This gives the area of the smallest US state...

[1] 1214 > which.min(state.area) #... this shows which element it is (the 39th as it happens) [1] 39 > state.name[39] #You can obtain individual elements by using square brackets [1] "Rhode Island" > state.name[39] <- "The smallest state" #You can replace elements using [] too > state.name #The 39th name ("Rhode Island") should now have been changed

[1] "Alabama"      "Alaska"       "Arizona"      "Arkansas"     
[5] "California"     "Colorado"      "Connecticut"    "Delaware"     
[9] "Florida"      "Georgia"      "Hawaii"       "Idaho"       

[13] "Illinois" "Indiana" "Iowa" "Kansas" [17] "Kentucky" "Louisiana" "Maine" "Maryland" [21] "Massachusetts" "Michigan" "Minnesota" "Mississippi" [25] "Missouri" "Montana" "Nebraska" "Nevada" [29] "New Hampshire" "New Jersey" "New Mexico" "New York" [33] "North Carolina" "North Dakota" "Ohio" "Oklahoma" [37] "Oregon" "Pennsylvania" "THE SMALLEST STATE" "South Carolina" [41] "South Dakota" "Tennessee" "Texas" "Utah" [45] "Vermont" "Virginia" "Washington" "West Virginia" [49] "Wisconsin" "Wyoming" > state.name[1:10] #This returns a new vector consisting of only the first 10 states

[1] "Alabama"   "Alaska"   "Arizona"   "Arkansas"  "California" "Colorado"  
[7] "Connecticut" "Delaware"  "Florida"   "Georgia"  

> state.name[-(1:10)] #Using negative numbers gives everything but the first 10 states

[1] "Hawaii"       "Idaho"       "Illinois"      "Indiana"      
[5] "Iowa"        "Kansas"       "Kentucky"      "Louisiana"     
[9] "Maine"       "Maryland"      "Massachusetts"   "Michigan"     

[13] "Minnesota" "Mississippi" "Missouri" "Montana" [17] "Nebraska" "Nevada" "New Hampshire" "New Jersey" [21] "New Mexico" "New York" "North Carolina" "North Dakota" [25] "Ohio" "Oklahoma" "Oregon" "Pennsylvania" [29] "THE SMALLEST STATE" "South Carolina" "South Dakota" "Tennessee" [33] "Texas" "Utah" "Vermont" "Virginia" [37] "Washington" "West Virginia" "Wisconsin" "Wyoming" > state.name[c(1,2,2,1)] #You can also obtain the same element multiple times [1] "Alabama" "Alaska" "Alaska" "Alabama" > ###Logical vectors are a little more complicated to get your head round > state.area < 10000 #A LOGICAL vector, identifying which states are small

[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE

[16] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE [31] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE [46] FALSE FALSE FALSE FALSE FALSE > state.name[state.area < 10000] #So this can be used to select the names of the small states [1] "Connecticut" "Delaware" "Hawaii" "Massachusetts" [5] "New Hampshire" "New Jersey" "THE SMALLEST STATE" "Vermont"

Although the [] operator can be used to access just a single element of a vector, it is particularly useful for accessing a number of elements at once. Another operator, the double square-bracket ([[) exists for specifically accessing a single element. While not particularly useful for vectors, it comes into its own for #Lists and #Data frames.


Notes[edit | edit source]

  1. if the logical vector is shorter than the original vector, then it is sequentially repeated until it is of the right length