# Stata/Descriptive Statistics

From Wikibooks, open books for an open world

< Stata

In this section we show how to do summary statistics using Stata. This section include three subsection. The first one deals with commands to describe a whole dataset, the second one, with commands to describe a single variable and the third one with command to describe a set of variables.

## Contents

## Describe a dataset[edit]

- 'des' (describe) : gives the size of the file, the number of observation, the number of variable, the list, labels, types for each variable.
- 'des, s' (describe short) : gives only the size of the file, the number of observations, the number of variables.
- 'des' return the number of changes since last save, the number of variables 'r(k)', the number of observations 'r(N)'.

. sysuse cancer, clear (Patient Survival in Drug Trial) . describe . des, s . ret list

**codebook****inspect**

## Univariate statistics[edit]

### Continuous variables[edit]

- su
- su, d
- robmean : robust mean

### Discrete variables[edit]

- ta

## Multivariate statistics[edit]

### Continuous variables[edit]

**corr**returns the matrix of linear correlation between a set of variables.**corr, cov**returns the covariance matrix.

Here is an example. We first simulate a y and x such that there is a positive correlation between them. We plot the two variables and look at the correlation.

. clear . set obs 1000 . gen x = invnorm(uniform()) . gen u = invnorm(uniform()) . gen y = x + u . tw sc y x || lfit y x . corr y x (obs=1000) | y x -------------+------------------ y | 1.0000 x | 0.7197 1.0000

**wincorr**returns the winsorized correlation : tails have replaced by a limit value. This is useful if some extreme values have a big influence on the correlation coefficient.

**spearman**and**spearman2**gives the Spearman's rank correlation between two variables. This statistics is less sensitive to outliers than Pearson's linear correlation. This is often useful as a robustness check.

. spearman y x Number of obs = 1000 Spearman's rho = 0.7090 Test of Ho: y and x are independent Prob > |t| = 0.0000

### Discrete variables[edit]

**ta**

### Continuous and discrete variables[edit]

**catgraph**: plotting means of a continuous variable by categories**table**