Stata/Descriptive Statistics

From Wikibooks, open books for an open world
< Stata
Jump to: navigation, search

In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

Describe a dataset[edit]

  • 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
  • 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
  • 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.
. sysuse cancer, clear
(Patient Survival in Drug Trial)
. describe
. des, s
. ret list
  • codebook
  • inspect

Univariate statistics[edit]

Continuous variables[edit]

  • su
  • su, d
  • robmean : robust mean

Discrete variables[edit]

  • ta

Multivariate statistics[edit]

Continuous variables[edit]

  • corr returns the matrix of linear correlation between a set of variables.
    • corr, cov returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

. clear
. set obs 1000 
. gen x =  invnorm(uniform())
. gen u =  invnorm(uniform())
. gen y = x + u
. tw sc y x || lfit y x
. corr y x
(obs=1000)

             |        y        x
-------------+------------------
           y |   1.0000
           x |   0.7197   1.0000

  • wincorr returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.
  • spearman and spearman2 gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.
. spearman y x

 Number of obs =    1000
Spearman's rho =       0.7090

Test of Ho: y and x are independent
    Prob > |t| =       0.0000

Discrete variables[edit]

  • ta

Continuous and discrete variables[edit]

  • catgraph : plotting means of a continuous variable by categories
  • table