R Programming/Descriptive Statistics

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search


Contents

[edit] Generic Functions

  • describe in the Hmisc package
  • summary
> summary(mydat)  # Descriptive Statistics
      obs            Height          Weight           BMI       
 Min.   : 1.00   Min.   :165.0   Min.   :51.00   Min.   :16.10  
 1st Qu.: 3.25   1st Qu.:170.2   1st Qu.:61.00   1st Qu.:21.25  
 Median : 5.50   Median :174.5   Median :70.00   Median :22.70  
 Mean   : 5.50   Mean   :173.3   Mean   :68.50   Mean   :22.89  
 3rd Qu.: 7.75   3rd Qu.:177.0   3rd Qu.:74.25   3rd Qu.:25.29  
 Max.   :10.00   Max.   :178.0   Max.   :88.00   Max.   :31.18 

> describe(mydat)
mydat 

 4  Variables      10  Observations
--------------------------------------------------------------------------------
obs 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     10       0      10     5.5    1.45    1.90    3.25    5.50    7.75    9.10 
    .95 
   9.55 

           1  2  3  4  5  6  7  8  9 10
Frequency  1  1  1  1  1  1  1  1  1  1
%         10 10 10 10 10 10 10 10 10 10
--------------------------------------------------------------------------------
Height 
      n missing  unique    Mean 
     10       0       7   173.3 

          165 168 170 171 172 177 178
Frequency   1   1   1   1   1   3   2
%          10  10  10  10  10  30  20
--------------------------------------------------------------------------------
Weight 
      n missing  unique    Mean 
     10       0       9    68.5 

          51 52 61 69 71 72 75 85 88
Frequency  1  1  2  1  1  1  1  1  1
%         10 10 20 10 10 10 10 10 10
--------------------------------------------------------------------------------
BMI 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     10       0      10   22.89   16.32   16.55   21.25   22.70   25.29   27.54 
    .95 
  29.36 

16.0964524681227 (1, 10%), 16.5980401544894 (1, 10%), 20.8611196607503 (1, 10%) 
22.4058769513315 (1, 10%), 22.4087867693473 (1, 10%), 22.98190175237 (1, 10%) 
23.3234180638183 (1, 10%), 25.9515570934256 (1, 10%), 27.1314117909924 (1, 10%) 
31.1791383219955 (1, 10%) 
--------------------------------------------------------------------------------

[edit] Univariate analysis

[edit] Continuous variable

  • boxplot()


[edit] Plotting density

x <- rnorm(10^3)
hist(x)
plot(density(x))

By storing the histogram in an object, you obtain all the computation made to compute the histogram.

> x <- rnorm(100)
> hist(x)
> h <- hist(x)
> h
$breaks
 [1] -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0  2.5

$counts
 [1]  1  1 11 22 20 13 18  9  4  1

$intensities
 [1] 0.02000000 0.02000000 0.22000000 0.44000000 0.40000000
 [6] 0.26000000 0.36000000 0.18000000 0.08000000 0.02000000

$density
 [1] 0.02000000 0.02000000 0.22000000 0.44000000 0.40000000
 [6] 0.26000000 0.36000000 0.18000000 0.08000000 0.02000000

$mids
 [1] -2.25 -1.75 -1.25 -0.75 -0.25  0.25  0.75  1.25  1.75  2.25

$xname
[1] "x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"


[edit] Testing normality

> N <- 100
> x <- rnorm(N)
> shapiro.test(x)

	Shapiro-Wilk normality test

data:  x 
W = 0.9916, p-value = 0.7902
> library("nortest")
> ad.test(x)

	Anderson-Darling normality test

data:  x 
A = 0.2541, p-value = 0.7247

See also the package ADGofTest[1] for another version of this test

  • Shapiro-Francia normality test
> sf.test(x)

	Shapiro-Francia normality test

data:  x 
W = 0.9866, p-value = 0.9953
  • Pearson chi-square normality test
> library("nortest")
> pearson.test(x)

	Pearson chi-square normality test

data:  x 
P = 0.8, p-value = 0.8495
  • Cramer-von Mises normality test
> cvm.test(x)

	Cramer-von Mises normality test

data:  x 
W = 0.0182, p-value = 0.9756
  • Kolmogorov-Smirnov
> lillie.test(x)

	Lilliefors (Kolmogorov-Smirnov) normality test

data:  x 
D = 0.0955, p-value = 0.9982
  • Jarque Bera Test
> jarque.bera.test(x)

	Jarque Bera Test

data:  x 
X-squared = 0.6245, df = 2, p-value = 0.7318

[edit] Discrete variable

We generate a discrete variable :

> x <- sample(c("A","B","C"),100,replace=T)
> tab <- table(x)
> tab
x
 A  B  C 
33 29 38 

We can plot this variable :

> pie(tab)
> barplot(tab)
> dotchart(tab)

[edit] Bivariate analysis

[edit] Continuous variables

[edit] Discrete variables

  • table() and prop.table() for contingency tables.
  • assocplot() for graphical display of contingency table.

[edit] Discrete and Continuous variables

  • bystats Statistics by Categories in the Hmisc package
  • Equality of two sample mean
  • Equality of variance


Previous: Graphics Index Next: Linear Models