Biostatistics with R/Import
Why R for biostatistics?
R is superior to common statistical packages such as SPSS, SAS and MINITAB because it is
- available for many platforms (Mac OS X, Windows, Linux etc.)
- extensively documented
You may refer to R FAQ
The format of data set available in Wiley's website are CSV, Excel, MINITAB, SAS and SPSS. Although you can import the data saved in Excel, SAS and SPSS into R using the foreign package, you should download the data in CSV format. It is because CSV is the easiest one to process in R.
For example, you would like to import the "Large Data set" data file. The downloaded data file (LDS_C02_NCBIRTH800.csv) , assuming stored in the directory "/desktop",can be imported into R as a data.frame called "largedataset" using following syntax:
> largedataset <- read.csv("/Desktop/LDS_C02_NCBIRTH800.csv", header=TRUE,na.strings="NA")
if you prefer to choose the data file using the standard "point-and-click" GUI way, you may use the function file.choose(), i.e.
largedataset <- read.csv(file.choose(), header=TRUE,na.strings="NA")
Now, you should imported the data from the CSV to a data frame called "largedataset". You may try to look inside the data frame by calling its name
You can access the variable (in computer lingo, column) "sex" inside the largedataset dataframe by
For example, you want to count the frequency of sex
You can attach the data frame so that you can call the variable directly
> attach(largedataset) > table(sex) > detach() #cancel attaching
Basic data management
R is designed to be a analysis system instead of a integrated environment such as SPSS. Unlike SPSS, R doesn't have a spreadsheet-like environment for data input. Usually data are entered using different software (e.g. database, spreadsheet software such as OO.o Calc) and then imported to R as described above. For quick one-off calculations, you can do the data entry in R. For example, if you want to calculate the mean age of ten patients (30,31,32,34,35,36,37,30,40,45) you can enter the data into R using the c() function.
> pt_age <- c(30,31,32,34,35,36,37,30,40,45)
You may call the newly created object pt_age by its name...
...and then calculate the mean age of the ten patients.
> mean (pt_age)