Read and import data
Usually, data are loaded into memory using the
use command. The
clear option makes it sure that the current database in memory will be removed without saving the last changes.
use "W:\Data\…\table.dta" , clear
cd command allows to specify a working directory and makes it easier to load tables into memory.
cd "W:\Data\" use table, clear
Stata9 users can import Stata10 datasets using the
use10 table, clear
Some example datasets are stored in the Stata directory. They can be loaded into memory using the
. sysuse cancer, clear . sysuse smoking, clear . sysuse auto, clear . sysuse jspmix, clear
You can import a Comma Separated Value (CSV) format using insheet
insheet using "W:\Data\…\table.csv", delim(";")
- 'webuse' for internet data
- 'xmluse' for xml files
- 'infile' for text files
- 'input' for entering data from keyboard
- 'fdause' for SAS xport data
- If none of these command works, you may use Stat/Transfer
- FTRANS: module to batch convert file formats
Save and export data
save table, replace
If you use Stata10 you can export to Stata9 format using saveold
saveold table, replace
- outsheet : export to tab delimited or csv format.
outsheet using "W:\Data\…\table.csv", replace comma
Append and merge
The standard Stata command is
merge. However, the user-written command
mmerge is safer and gives a better output. This command may be installed using
ssc install mmerge command or using
- joinby merge all possible pairs between the datasets
- append if you have two datasets with the same variable but different observations, you can make one dataset using the append command.
use data_1, clear append data_2 br
Describe a datasets
- des, s
Detect missing values
You can convert missing values to values using the mvencode command.
mvencode exg ga dvg verts eco dr dvd fn reg mnr div, mv(0) override
Very often you have to convert variable from a string to a numerical format. There are several way to do it. If you already have numeric values in your string variable, you should use destring. Otherwise you should use the encode command. Encode will automatically create a numerical variable and will use as a value label the string values of the previous variable.
'vallist' gives the list of all categories of a categorical variable in Stata.
Dealing with labels
- lab var
- lab list
- lab define
- lab value
- You can expand a dataset (ie multiplying observations by a given factor) using the expand command.
This is useful for generating panel data models. In the first example, we draw 10 observations in a standard normal distribution and we replicate each observation once.
clear set obs 10 gen u = invnorm(uniform()) expand 2 sort u br
It is also possible to pass an integer variable as an argument to expand.
clear set obs 10 gen u = uniform() gen var = 1 + int(10 * uniform()) expand var sort u br
clear set obs 10 gen u = invnorm(uniform()) expandcl 2 , gen(cl)
Data Storage types
All numeric types in Stata are normal "signed" quantities except that the highest 27 values are reserved for the "missing" types (., .a, .b, ..., .z). The storage size of the each variable is as follows:
|Variable||Size (in bytes)|
|string||1 per-letter (therefore only ASCII characters, not full Unicode/UTF-8)|