R Programming/Descriptive Statistics
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Generic Functions
- describe in the Hmisc package
- summary
> summary(mydat) # Descriptive Statistics
obs Height Weight BMI
Min. : 1.00 Min. :165.0 Min. :51.00 Min. :16.10
1st Qu.: 3.25 1st Qu.:170.2 1st Qu.:61.00 1st Qu.:21.25
Median : 5.50 Median :174.5 Median :70.00 Median :22.70
Mean : 5.50 Mean :173.3 Mean :68.50 Mean :22.89
3rd Qu.: 7.75 3rd Qu.:177.0 3rd Qu.:74.25 3rd Qu.:25.29
Max. :10.00 Max. :178.0 Max. :88.00 Max. :31.18
> describe(mydat)
mydat
4 Variables 10 Observations
--------------------------------------------------------------------------------
obs
n missing unique Mean .05 .10 .25 .50 .75 .90
10 0 10 5.5 1.45 1.90 3.25 5.50 7.75 9.10
.95
9.55
1 2 3 4 5 6 7 8 9 10
Frequency 1 1 1 1 1 1 1 1 1 1
% 10 10 10 10 10 10 10 10 10 10
--------------------------------------------------------------------------------
Height
n missing unique Mean
10 0 7 173.3
165 168 170 171 172 177 178
Frequency 1 1 1 1 1 3 2
% 10 10 10 10 10 30 20
--------------------------------------------------------------------------------
Weight
n missing unique Mean
10 0 9 68.5
51 52 61 69 71 72 75 85 88
Frequency 1 1 2 1 1 1 1 1 1
% 10 10 20 10 10 10 10 10 10
--------------------------------------------------------------------------------
BMI
n missing unique Mean .05 .10 .25 .50 .75 .90
10 0 10 22.89 16.32 16.55 21.25 22.70 25.29 27.54
.95
29.36
16.0964524681227 (1, 10%), 16.5980401544894 (1, 10%), 20.8611196607503 (1, 10%)
22.4058769513315 (1, 10%), 22.4087867693473 (1, 10%), 22.98190175237 (1, 10%)
23.3234180638183 (1, 10%), 25.9515570934256 (1, 10%), 27.1314117909924 (1, 10%)
31.1791383219955 (1, 10%)
--------------------------------------------------------------------------------
[edit] Univariate analysis
[edit] Continuous variable
- boxplot()
[edit] Plotting density
x <- rnorm(10^3) hist(x) plot(density(x))
By storing the histogram in an object, you obtain all the computation made to compute the histogram.
> x <- rnorm(100) > hist(x) > h <- hist(x) > h $breaks [1] -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 $counts [1] 1 1 11 22 20 13 18 9 4 1 $intensities [1] 0.02000000 0.02000000 0.22000000 0.44000000 0.40000000 [6] 0.26000000 0.36000000 0.18000000 0.08000000 0.02000000 $density [1] 0.02000000 0.02000000 0.22000000 0.44000000 0.40000000 [6] 0.26000000 0.36000000 0.18000000 0.08000000 0.02000000 $mids [1] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 2.25 $xname [1] "x" $equidist [1] TRUE attr(,"class") [1] "histogram"
[edit] Testing normality
> N <- 100 > x <- rnorm(N) > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9916, p-value = 0.7902
> library("nortest")
> ad.test(x)
Anderson-Darling normality test
data: x
A = 0.2541, p-value = 0.7247
See also the package ADGofTest[1] for another version of this test
- Shapiro-Francia normality test
> sf.test(x) Shapiro-Francia normality test data: x W = 0.9866, p-value = 0.9953
- Pearson chi-square normality test
> library("nortest")
> pearson.test(x)
Pearson chi-square normality test
data: x
P = 0.8, p-value = 0.8495
- Cramer-von Mises normality test
> cvm.test(x) Cramer-von Mises normality test data: x W = 0.0182, p-value = 0.9756
- Kolmogorov-Smirnov
> lillie.test(x) Lilliefors (Kolmogorov-Smirnov) normality test data: x D = 0.0955, p-value = 0.9982
- Jarque Bera Test
> jarque.bera.test(x) Jarque Bera Test data: x X-squared = 0.6245, df = 2, p-value = 0.7318
[edit] Discrete variable
We generate a discrete variable :
> x <- sample(c("A","B","C"),100,replace=T)
> tab <- table(x) > tab x A B C 33 29 38
We can plot this variable :
> pie(tab) > barplot(tab) > dotchart(tab)
[edit] Bivariate analysis
[edit] Continuous variables
[edit] Discrete variables
- table() and prop.table() for contingency tables.
- assocplot() for graphical display of contingency table.
[edit] Discrete and Continuous variables
- bystats Statistics by Categories in the Hmisc package
- Equality of two sample mean
- Equality of variance