R Programming/Data types

From Wikibooks, open books for an open world
Jump to: navigation, search


Contents

[edit] Data types

Vectors are the simplest R objects. Factors are similar to vectors but with a predetermined set of levels. A matrix is like a vector but with a specific instruction for the layout such that it looks like a matrix. Arrays are similar to matrix but can have more than 2 dimensions. A list is a vector of R objects. A dataframe is like a matrix but does not assume that all columns have the same type. A dataframe is a list of variables/vectors of the same length. Classes define how objects of a certain type look like. Classes are attached to object as an attribute. All R objects have a class, a type and a dimension.

> class(object)
> typeof(object)
> dim(object)


[edit] Vectors

You can create a vector using the c() function which concatenates some elements. You can create a sequence using the : symbol or the seq() function. For instance 1:5 gives all the number between 1 and 5. The seq() let you specify the interval between each number. You can also repeat a pattern using the rep() function. You can also create a numeric vector of missing values using numeric(), a character vector of missing values using character() and a logical vector of missing values (ie FALSE) using logical()

> c(1,2,3,4,5)
[1] 1 2 3 4 5
> c("a","b","c","d","e")
[1] "a" "b" "c" "d" "e"
> c(T,F,T,F)
[1]  TRUE FALSE  TRUE FALSE
 
> 1:5
[1] 1 2 3 4 5
> 5:1
[1] 5 4 3 2 1
> seq(1,5)
[1] 1 2 3 4 5
> seq(1,5,step=.5)
[1] 1 2 3 4 5
> seq(1,5,by=.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
> rep(1,5)
[1] 1 1 1 1 1
> rep(1:2,5)
 [1] 1 2 1 2 1 2 1 2 1 2
> numeric(5)
[1] 0 0 0 0 0
> logical(5)
[1] FALSE FALSE FALSE FALSE FALSE
> character(5)
[1] "" "" "" "" ""

The length() computes the length of a vector. last() (sfsmisc) returns the last element of a vector but this can also be achieved simply without the need for an extra package.

x <- seq(1,5,by=.5)    # Create a sequence of number
x                      # Display this object
 [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
> length(x)            # Get length of object x
 [1] 9
> library(sfsmisc)
> last(x)              # Select the last element of x  
 [1] 5.0
> x[length(x)]         # Select the last element wihout an extra package.
 [1] 5.0


[edit] Factors

factor() transforms a vector into a factor. A factor can also be ordered with the option ordered=T or the function ordered(). levels() returns the levels of a factor. gl() generates factors. n is the number of levels, k the number of repetition of each factor and length the total length of the factor. labels is optional and gives labels to each level.

See also is.factor(), as.factor(), is.ordered() and as.ordered().

 
> factor(c("yes","no","yes","maybe","maybe","no","maybe","no","no"))
[1] yes   no    yes   maybe maybe no    maybe no    no   
Levels: maybe no yes
> 
> factor(c("yes","no","yes","maybe","maybe","no","maybe","no","no"), ordered = T)
[1] yes   no    yes   maybe maybe no    maybe no    no   
Levels: maybe < no < yes
> 
> ordered(c("yes","no","yes","maybe","maybe","no","maybe","no","no"))
[1] yes   no    yes   maybe maybe no    maybe no    no   
Levels: maybe < no < yes
>
>  gl(n=2, k=2, length=10, labels = c("Male", "Female")) # generate factor levels
 [1] Male   Male   Female Female Male   Male   Female Female Male   Male  
Levels: Male Female

[edit] Matrix

  • If you want to create a new matrix, one way is to use the matrix() function. You have to enter a vector of data, the number of rows and/or columns and finally you can specify if you want R to read your vector by row or by column (the default option). Here are two examples.
> matrix(data = NA, nrow = 5, ncol = 5, byrow = T)
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA
> matrix(data = 1:15, nrow = 5, ncol = 5, byrow = T)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]    1    2    3    4    5
[5,]    6    7    8    9   10
  • Functions cbind() and rbind() combine vectors into matrices in a column by column or row by row mode:
> v1 <- 1:5
> v2 <- 5:1
> v2
[1] 5 4 3 2 1
> cbind(v1,v2)
     v1 v2
[1,]  1  5
[2,]  2  4
[3,]  3  3
[4,]  4  2
[5,]  5  1
 
> rbind(v1,v2)
   [,1] [,2] [,3] [,4] [,5]
v1    1    2    3    4    5
v2    5    4    3    2    1
  • The dimension of a matrix can be obtained using the dim() function. Alternatively nrow() and ncol() returns the number of rows and columns in a matrix:
> dim(X)
[1] 5 5
> nrow(X)
[1] 5
> ncol(X)
[1] 5
  • Function t() transposes a matrix:
> X<-matrix(data = 1:15, nrow = 5, ncol = 5, byrow = T)
> t(X)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11    1    6
[2,]    2    7   12    2    7
[3,]    3    8   13    3    8
[4,]    4    9   14    4    9
[5,]    5   10   15    5   10
  • Unlike data frames matrices must either be numeric or character in type:
> a=matrix(2,2,2)
> a
     [,1] [,2]
[1,]    2    2
[2,]    2    2
> a = rbind(a,c("A","A"))
> a
     [,1] [,2]
[1,] "2"  "2" 
[2,] "2"  "2" 
[3,] "A"  "A"


[edit] Arrays

An array is composed of n dimensions where each dimension is a vector of R objects of the same type. An array of one dimension of one element may be constructed as follows.

> x = array(c(T,F),dim=c(1))
> print(x)
[1] TRUE

The array x was created with a single dimension (dim=c(1)) drawn from the vector of possible values c(T,F). A similar array, y, can be created with a single dimension and two values.

> y = array(c(T,F),dim=c(2))
> print(y)
[1]  TRUE FALSE

A three dimensional array - 3 by 3 by 3 - may be created as follows.

> z = array(1:27,dim=c(3,3,3))
> dim(z)
[1] 3 3 3
> print(z)
, , 1
 
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
 
, , 2
 
     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18
 
, , 3
 
     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27

R arrays are accessed in a manner similar to arrays in other languages: by integer index, starting at 1 (not 0). The following code shows how the third dimension of the 3 by 3 by 3 array can be accessed. The third dimension is a 3 by 3 array.

> z[,,3]
     [,1] [,2] [,3]
[1,]   19   22   25
[2,]   20   23   26
[3,]   21   24   27

Specifying two of the three dimensions returns an array on one dimension.

> z[,3,3]
[1] 25 26 27

Specifying three of three dimension returns an element of the 3 by 3 by 3 array.

> z[3,3,3]
[1] 27

More complex partitioning of array may be had.

> z[,c(2,3),c(2,3)]
, , 1
 
     [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18
 
, , 2
 
     [,1] [,2]
[1,]   22   25
[2,]   23   26
[3,]   24   27

Arrays need not be symmetric across all dimensions. The following code creates a pair of 3 by 3 arrays.

> w = array(1:18,dim=c(3,3,2))
> print(w)
, , 1
 
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
 
, , 2
 
     [,1] [,2] [,3]
[1,]   10   13   16
[2,]   11   14   17
[3,]   12   15   18

Objects of the vectors composing the array must be of the same type, but they need not be numbers.

> u = array(c(T,F),dim=c(3,3,2))
> print(u)
, , 1
 
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE  TRUE
[2,] FALSE  TRUE FALSE
[3,]  TRUE FALSE  TRUE
 
, , 2
 
      [,1]  [,2]  [,3]
[1,] FALSE  TRUE FALSE
[2,]  TRUE FALSE  TRUE
[3,] FALSE  TRUE FALSE


[edit] Lists

A list is a collection of R objects. list() creates a list. unlist() transform a list into a vector. The objects in a list do not have to be of the same type or length.

> x <- c(1:4)
> y <- FALSE
> z <- matrix(c(1:4),nrow=2,ncol=2)
> myList <- list(x,y,z)
> myList
 [[1]]
[1] 1 2 3 4
 
 [[2]]
[1] FALSE
 
 [[3]]
     [,1] [,2]
[1,]    1    2
[2,]    3    4

lists have very flexible methods for reference

  • by index number:
> a = list()
> a
list()
> a[[1]] = "A"
> a
[[1]]
[1] "A"
 
> a[[2]]="B"
> a
[[1]]
[1] "A"
 
[[2]]
[1] "B"
  • By name:
> a
list()
> a$fruit = "Apple"
> a
$fruit
[1] "Apple"
 
> a$color = "green"
> a
$fruit
[1] "Apple"
 
$color
[1] "green"
  • This can also be recursive and in combination
> a = list()
> a[[1]] = "house"
> a$park = "green's park"
> a
[[1]]
[1] "house"
 
$park
[1] "green's park"
 
 
> a$park = "green's park"
> a[[1]]$address = "1 main st."
 
> a
[[1]]
[[1]][[1]]
[1] "house"
 
[[1]]$address
[1] "1 main st."
 
 
$park
[1] "green's park"

Using the scoping rules in R one can also dynamically name and create list elements

>  a = list()
>  n = 1:10
>  fruit = paste("number of coconuts in bin",n)
> my.number = paste("I have",10:1,"coconuts")
> for (i in 1:10)a[fruit[i]] = my.number[i]
>  a$'number of coconuts in bin 7'
[1] "I have 4 coconuts"

[edit] Data Frames

A dataframe has been referred to as "a list of variables/vectors of the same length". In the following example, a dataframe of two vectors is created, each of five elements. The first vector, v1, is compose of a sequence of the integers 1 through 5. A second vector, v2, is composed of five logical values drawn of type T and F. The dataframe is then created, composed of the vectors. The columns of the data frame can be accessed using integer subscripts or the column name and the $ symbol.

> v1 = 1:5
> v2 = c(T,T,F,F,T)
> df = data.frame(v1,v2)
> print(df)
  v1    v2
1  1  TRUE
2  2  TRUE
3  3 FALSE
4  4 FALSE
5  5  TRUE
> df[,1]
 [1] 1 2 3 4 5
> df$v2
 [1] TRUE TRUE FALSE FALSE TRUE

The dataframe may be created directly. In the following code, the dataframe is created - naming each vector composing the dataframe as part of the argument list.

> df = data.frame(foo=1:5,bar=c(T,T,F,F,T))
> print(df)
  foo   bar
1   1  TRUE
2   2  TRUE
3   3 FALSE
4   4 FALSE
5   5  TRUE

[edit] External links

[edit] References

Previous: Sample Session Index Next: Settings
Personal tools
Namespaces
Variants
Actions
Navigation
Community
Toolbox
Sister projects
Print/export