R Programming/Importing and exporting data

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Data can be stored in a large variety of formats. Each statistical package has its own format for data (xls for Microsoft Excel, dta for Stata, sas7bdat for SAS, ...). R can read almost all file formats. We present a method for each kind of file. If none of the following methods work, you can use a specific software for data conversion such as the free software OpenRefine or the commercial software Stat Transfer.[1] In any case, most statistical software can export data in a CSV (comma separated values) format and all of them can read CSV data. This is often the best solution to make data available to everyone.

Graphical user interfaces[edit | edit source]

Some IDE or GUI provides some press button solution to import data.

You may also have a look at speedR, a graphical user interface which helps at importing data from Excel, OpenOfficeCalc, CSV and other text files.[2]

library(speedR)
speedR()

CSV (csv,txt,dat)[edit | edit source]

You can import data from a text file (often CSV) using read.table(), read.csv() or read.csv2(). The option header = TRUE indicates that the first line of the CSV file should be interpreted as variables names and the option sep = gives the separator (generally "," or ";").

csv.get() (Hmisc) is another possibility.

mydata <- read.table("data.txt",header=TRUE)
mydata <- read.table("data.csv", header = TRUE, sep=",")  # import from a CSV
mydata <- read.csv("data.csv", header=T)
mydata <- read.table("data.csv", header = TRUE, sep=";") 
mydata <- read.csv2("data.csv", header=T)

Note that there is no problem if your data are stored on the internet.

df <- read.table("http://www.mywebsite.com/.../data.csv", header = TRUE, sep = ",")

By default, strings are converted to factors. If you want to avoid this conversion, you can specify the option stringsAsFactors = FALSE.

You can export data to a text file using write.table().

write.table(mydat,file="mydat.csv",quote=T,append=F,sep=",",eol = "\n", na = "NA", dec = ".", row.names = T,col.names = T)

For large CSV files, it is possible to use the ff package.[3]

library("ff")
df <- read.csv.ffdf(file="large_csv_file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000)

Fixed width text files[edit | edit source]

read.fwf() and write.fwf().

Some fixed width text files are provided with a SAS script to import them. Anthony Damico has created SAScii package to easily import those data.[4]

Unstructured text files[edit | edit source]

Stata (dta)[edit | edit source]

  • We can read Stata data using read.dta() in the foreign package and export to Stata data format using write.dta().
  • Note that string variables in Stata are limited to 244 characters. This can be an issue during the exportation process.
  • See also Stata.file() in the memisc package and stata.get in the Hmisc package.
> library("foreign")
> mydata <- read.dta("mydata.dta",convert.dates = TRUE, convert.factors = TRUE, convert.underscore = TRUE)
> names(mydata)
> write.dta(mydata, file = "mydata.dta")

SAS (sas7bdat)[edit | edit source]

Experimental support for SAS databases having the sas7bdat extension is provided by the sas7bdat[5] package. However, sas7bdat files generated by 64 bit versions of SAS, and SAS running on non-Microsoft Windows platforms are not yet supported.

SAS (xpt)[edit | edit source]

  • See also sasexport.get() and sas.get() in the Hmisc
  • See also the SASxport package.
library("foreign")
mydata<-read.xport("SASData.xpt")
names(mydata)

SPSS (sav)[edit | edit source]

  • read.spss() (foreign) and spss.get() (Hmisc)
> library("foreign")
> mydata<-read.spss("SPSSData.sav")
> names(mydata)

EViews[edit | edit source]

readEViews() in the hexView package for EViews files.

Excel (xls,xlsx)[edit | edit source]

Importing data from Excel is not easy. The solution depends on your operating system. If none of the methods below works, you can always export each Excel spreadsheets to CSV format and read the CSV in R. This is often the simplest and quickest solution.

XLConnect supports reading and writing both xls and xlsx file formats. Since it is based on Apache POI it only requires a Java installation and as such works on many platforms including Windows, UNIX/Linux and Mac. Besides reading & writing data it provides a number of additional features such as adding plots, cell styling & style actions and many more.

require("XLConnect")
wb <- loadWorkbook("myfile.xls", create = FALSE)
# Show a summary of the workbook (shows worksheets,
# defined names, hidden sheets, active sheet name, ...)
summary(wb)
# Read data from a worksheet interpreting the first row as column names
df1 <- readWorksheet(wb, sheet = "mysheet")
# Read data from a named region/range interpreting the first row as column
# names
df2 <- readNamedRegion(wb, name = "myname", header = TRUE)

The RODBC solution:

library("RODBC")
32-bit Windows: channel <- odbcConnectExcel("Graphiques pourcent croissance.xls") # creates a connection
64-bit Windows: channel <- odbcConnectExcel2007("Graphiques pourcent croissance.xls")
sqlTables(channel) # List all the tables
effec <- sqlFetch(channel, "effec") # Read one spreadsheet as an R table
odbcClose(channel) # close the connection (don't forget)

The xlsReadWrite package (actually, this package does not exist on CRAN repos, but you can download old versions from CRAN archive).

> library(xlsReadWrite)
mydat <- read.xls("myfile.xls", colNames = T, sheet = "mysheet", type = "data.frame", from = 1, checkNames = TRUE)
  • "sheet" specifies the name or the number of the sheet you want to import.
  • "from" specifies the first row of the spreadsheet.

The gnumeric package.[6] This package use an external software called ssconvert which is usually installed with gnumeric, the Gnome office spreadsheet. The read.gnumeric.sheet() function reads xls and xlsx files.

library("gnumeric")
df1 <- read.gnumeric.sheet(file = "df.xls", head = TRUE, sheet.name = "Feuille1")
df2 <- read.gnumeric.sheet(file = "df.xlsx", head = TRUE, sheet.name = "Feuille1")

See also xlsx for Excel 2007 documents and read.xls() (gdata).

Google Spread Sheets[edit | edit source]

You should make the spreadsheet public, publish it as a CSV file. Then you can read it in R using read.csv(). See more on the Revolution's computing blog (link). See also RGoogleDocs (link).

# Read from a Google SpreadSheet.
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))

gnumeric spreadsheets[edit | edit source]

The gnumeric package[6]. read.gnumeric.sheet() reads one sheet and read.gnumeric.sheets() reads all sheets and store them in a list.

library("gnumeric")
df <- read.gnumeric.sheet(file = "df.gnumeric", head = TRUE, sheet.name = "df.csv")
View(df)
df <- read.gnumeric.sheets(file = "df.gnumeric", head = TRUE)
View(df$df.csv)

OpenOffice and LibreOffice (ods)[edit | edit source]

readODS does not require external dependencies, making it crossplatform.

library("readODS")
df=read.ods("df.ods")

speedR is another alternative.

library("speedR")
df <- speedR.importany(file = "df.ods")

Note that you can also use the speedR graphical user interface (speedR()) which will return the command line for replication.

library("speedR")
speedR()

JSON[edit | edit source]

JSON (JavaScript Object Notation) is a very common format on the internet. The rjson library makes it easy to import data from a json format[7].

# json.txt : a text file including data in the JSON format
library("rjson")
df <- fromJSON(paste(readLines("json.txt"), collapse=""))

Is is easy to export a list or a dataframe to a JSON format using the toJSON() function :

# df : a data frame
library("rjson")
json <- toJSON(df)

Sometimes the JSON data can be more complex with structures such as nested arrays. In this case you may find it more useful to use an online converter like json-csv.com to convert the file to CSV. Then import the resulting data as per the CSV instructions above.

dBase (dbf)[edit | edit source]

read.dbf() in the foreign package.

library("foreign")
df  <- read.dbf("file.dbf")
str(df)

Hierarchical Data Format (hdf5)[edit | edit source]

hdf5 data can be read using the hdf5 package[8].

SQL[edit | edit source]

DICOM and NIfTI[edit | edit source]

  • See "Working with the {DICOM} and {NIfTI} Data Standards in R" in the Journal of Statistical Software[9]

Resources[edit | edit source]

  • R Data Manual[10].
  • Paul Murrell's Introduction to Data Technologies[11].

References[edit | edit source]

  1. Stat Transfer
  2. speedR
  3. "Opening Large CSV Files in R". Retrieved March 7, 2013. {{cite web}}: Unknown parameter |site= ignored (help)
  4. David Smith. "Importing public data with SAS instructions into R". Revolution Analytics. Retrieved February 1, 2013.
  5. sas7bdat
  6. a b This command has been tested using Ubuntu 10.10 and R 2.11.1
  7. http://cran.r-project.org/web/packages/rjson/index.html
  8. http://cran.r-project.org/web/packages/hdf5/index.html
  9. Brandon Whitcher, Volker J. Schmid, Andrew Thorton "Working with the {DICOM} and {NIfTI} Data Standards in R", Journal of Statistical Software Vol. 44, Issue 6, Oct 2011, link
  10. R Data Manual
  11. Paul Murrell introduction to Data Technologies
Previous: Data Management Index Next: Graphics