# Stata/Descriptive Statistics

< Stata

Jump to navigation
Jump to search
In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

## Contents

## Describe a dataset[edit]

- 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
- 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
- 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.

. sysuse cancer, clear (Patient Survival in Drug Trial) . describe . des, s . ret list

**codebook****inspect**

## Univariate statistics[edit]

### Continuous variables[edit]

- su
- su, d
- robmean : robust mean

### Discrete variables[edit]

- ta

## Multivariate statistics[edit]

### Continuous variables[edit]

**corr**returns the matrix of linear correlation between a set of variables.**corr, cov**returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

. clear . set obs 1000 . gen x = invnorm(uniform()) . gen u = invnorm(uniform()) . gen y = x + u . tw sc y x || lfit y x . corr y x (obs=1000) | y x -------------+------------------ y | 1.0000 x | 0.7197 1.0000

**wincorr**returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.

**spearman**and**spearman2**gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.

. spearman y x Number of obs = 1000 Spearman's rho = 0.7090 Test of Ho: y and x are independent Prob > |t| = 0.0000

### Discrete variables[edit]

**ta**

### Continuous and discrete variables[edit]

**catgraph**: plotting means of a continuous variable by categories**table**