R Programming/Data

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search

Contents

[edit] Example Datasets

  • Most packages include example datasets
  • The data() function without argument gives the list of all example datasets in all the loaded packages.
  • If you want to load them in memory, you just need to use the data function and include the name of the dataset as an argument
> data() # lists all the datasets in all the packages in memory
> data(package="datasets") # lists all the datasets in the "datasets" package
> data(Orange) # loads the orange dataset in memory

[edit] Data Input

  • scan()
  • readLines()

R has a spreadsheet-style data editor :

edit(mydata)

Read table from the clipboard :

> read.table("clipboard")
               Holidays HalfDays FullDays
Norway            0.333    0.056    0.611
Canada            0.067    0.200    0.733
Greece            0.138    0.862    0.000
France/Germany    0.083    0.083    0.833

[edit] Importing/Exporting Data

  • Hmisc csv.get()
  • Hmisc sasexport.det()
  • Hmisc sas.get()
  • spss.get()
  • stata.get()

Importing a text file :

mydata <- read.table("http://perso.univ-rennes1.fr/arthur.charpentier/data.txt",
header=TRUE)
mydata <- read.table("tmp.txt", header = TRUE, sep=",") 
  • Given the data file data.txt located at <path>:
1970    45    63
1980    52    59
1990    59    52
2000    63    45
  • This data can easily be loaded into R using the following commands:
setwd("<path>")                # change working directory
data <- read.table("data.txt")  # load data


One can easily import data from SPSS, SAS, Stata and other statistical packages :


[edit] Stata

library(foreign)
mydata<-read.dta("STATAData.dta")
names(mydata)

[edit] SAS

library(foreign)
mydata<-read.xport("SASData.xpt")
names(mydata)


[edit] SPSS

library(foreign)
mydata<-read.spss("SPSSData.sav")
names(mydata)


[edit] Excel

  • xlsReadWrite is no longer available.
  • gdata includes a function read.xls

Import from an excel spreadsheet

> library(xlsReadWrite)
mydata <- read.xls("myfile.xls", colNames = T, sheet = "mysheet",
+           type = "data.frame", from = 1, checkNames = TRUE)
  • "sheet" specifies the name or the number of the sheet you want to import.
  • "from" specifies the first row of the spreadsheet.

You can also use the "RODBC" package :

library(RODBC)
channel <- odbcConnectExcel("Graphiques pourcent croissance.xls") # creates a connection
sqlTables(channel) # List all the tables
effec <- sqlFetch(channel, "effec") # Read one spreadsheet as an R table

[edit] Google Doc Spreadsheet

[edit] Working with data

[edit] Load/Attach/Detach data

R allows you to load a datasets into memory such as you don't need to specify its name each time you use it.

attach(mydata)
…
detach(mydata)

[edit] Describe data

names(mydata)
str(mydata)
summary(mydata)

[edit] Dealing with missing values

[edit] Creating/removing variables

mydata$newvar <- oldvar

[edit] Exporting data

[edit] Merging two dataframes

Merging data can be very confusing, especially if the case of multiple merge. Here is a simple example :

We have one table describing authors :

> authors <- data.frame(
+     surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
+     nationality = c("US", "Australia", "US", "UK", "Australia"),
+     deceased = c("yes", rep("no", 4)))
> authors
   surname nationality deceased
1    Tukey          US      yes
2 Venables   Australia       no
3  Tierney          US       no
4   Ripley          UK       no
5   McNeil   Australia       no

and one table describing books

> books <- data.frame(
+     name = I(c("Tukey", "Venables", "Tierney",
+              "Ripley", "Ripley", "McNeil", "R Core")),
+     title = c("Exploratory Data Analysis",
+               "Modern Applied Statistics ...",
+               "LISP-STAT",
+               "Spatial Statistics", "Stochastic Simulation",
+               "Interactive Data Analysis",
+               "An Introduction to R"),
+     other.author = c(NA, "Ripley", NA, NA, NA, NA,
+                      "Venables & Smith"))
> books
      name                         title     other.author
1    Tukey     Exploratory Data Analysis             <NA>
2 Venables Modern Applied Statistics ...           Ripley
3  Tierney                     LISP-STAT             <NA>
4   Ripley            Spatial Statistics             <NA>
5   Ripley         Stochastic Simulation             <NA>
6   McNeil     Interactive Data Analysis             <NA>
7   R Core          An Introduction to R Venables & Smith

We want to merge tables books and authors by author's name ("surname" in the first dataset and "name" in the second one). We use the merge() command. We specify the name of the first and the second datasets, then by.x and by.y specify the identifier in both datasets. all.x and all.y specify if we want to keep all the observation of the first and the second dataset. In that case we want to have all the observations from the books dataset but we just keep the observations from the author dataset which match with an observation in the books dataset.

> final <- merge(books, authors, by.x = "name", by.y = "surname", sort=F,all.x=T,all.y=F)
> final
      name                         title     other.author nationality deceased
1    Tukey     Exploratory Data Analysis             <NA>          US      yes
2 Venables Modern Applied Statistics ...           Ripley   Australia       no
3  Tierney                     LISP-STAT             <NA>          US       no
4   Ripley            Spatial Statistics             <NA>          UK       no
5   Ripley         Stochastic Simulation             <NA>          UK       no
6   McNeil     Interactive Data Analysis             <NA>   Australia       no
7   R Core          An Introduction to R Venables & Smith        <NA>     <NA>


[edit] References

Previous: Programming Index Next: Graphics