Stata/Descriptive Statistics

From Wikibooks, open books for an open world
Jump to navigation Jump to search

In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

Describe a dataset[edit | edit source]

  • 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
  • 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
  • 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.
. sysuse cancer, clear
(Patient Survival in Drug Trial)
. describe
. des, s
. ret list
  • codebook
  • inspect

Univariate statistics[edit | edit source]

Continuous variables[edit | edit source]

  • su
  • su, d
  • robmean : robust mean

Discrete variables[edit | edit source]

  • ta

Multivariate statistics[edit | edit source]

Continuous variables[edit | edit source]

  • corr returns the matrix of linear correlation between a set of variables.
    • corr, cov returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

. clear
. set obs 1000 
. gen x =  invnorm(uniform())
. gen u =  invnorm(uniform())
. gen y = x + u
. tw sc y x || lfit y x
. corr y x
(obs=1000)

             |        y        x
-------------+------------------
           y |   1.0000
           x |   0.7197   1.0000

  • wincorr returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.
  • spearman and spearman2 gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.
. spearman y x

 Number of obs =    1000
Spearman's rho =       0.7090

Test of Ho: y and x are independent
    Prob > |t| =       0.0000

Discrete variables[edit | edit source]

  • ta

Continuous and discrete variables[edit | edit source]

  • catgraph : plotting means of a continuous variable by categories
  • table