In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.
Describe a dataset
- 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
- 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
- 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.
. sysuse cancer, clear (Patient Survival in Drug Trial) . describe . des, s . ret list
- su, d
- robmean : robust mean
- corr returns the matrix of linear correlation between a set of variables.
- corr, cov returns the covariance matrix.
Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.
. clear . set obs 1000 . gen x = invnorm(uniform()) . gen u = invnorm(uniform()) . gen y = x + u . tw sc y x || lfit y x . corr y x (obs=1000) | y x -------------+------------------ y | 1.0000 x | 0.7197 1.0000
- wincorr returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.
- spearman and spearman2 gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.
. spearman y x Number of obs = 1000 Spearman's rho = 0.7090 Test of Ho: y and x are independent Prob > |t| = 0.0000
Continuous and discrete variables
- catgraph : plotting means of a continuous variable by categories