SAS/Data Management

From Wikibooks, open books for an open world
< SAS
Jump to navigation Jump to search

Library[edit | edit source]

Data are stored in a directory. You can give a name to that directory using the 'libname' statement.

LIBNAME name_of_the_libraray "W:/Desktop" ;

Read/Describe/list data[edit | edit source]

  • Proc dataset
  • Proc content
  • Proc print

The Proc datasets allow to list the content of all the datasets in a library

libname lib "W:\…\data" ; 
proc datasets library = lib;
   contents data=_all_;
run;

Generate Random Variables[edit | edit source]

The following program draws 5 observations from a normal distribution with expectancy 1.75 and standard deviation 0.1. The rannor() function draws from a standard normal distribution. The argument specifies the seed. This allows reproducibility.

data taille ; 
do i = 1 to 5 ; 
x = 1.75 + 0.1 * rannor(1) ; 
output ; end ; run;

If you know the quantile function (inverse CDF), you can draw in the distribution using the inverse CDF method. You simple have to draw in a uniform distribution and transform the draw using the inverse CDF function. Here is an example with a Gumbel distribution :

gumbel = - log(-log(ranuni(0))) ;

Input Data[edit | edit source]

data base ; 
input x u ; 
cards ; 
1 -1
2 1
3 -1
4 1
5 -1 ;
run ; 

proc print data = base ; run ;

Sorting[edit | edit source]

 proc sort data=lib.data out=lib.data2 tagsort;
 by x1 x2;
 run;

The default is ascending sort. If you want to put the highest values first, you can use by descending.

 proc sort data=lib.data out=lib.data2 tagsort;
 by x1 descending x2;
 run;

Merge[edit | edit source]

It is better to sort the data before merging them.

 data lib_name.data_name3;
 merge lib_name.data_name3 (in=a) lib_name.data_name2 (in=b);
 by x1;
 if a and b;
 run;


DATA a; INPUT famid name $ inc98;
DATALINES;
2 Art  22000
1 Bill 30000
3 Paul 25000;RUN;
DATA b;
 INPUT famid inc96 inc97 inc99;
DATALINES;
3 75000 76000 77000
1 40000 40500 41000
2 45000 45400 45800
4 200 300 100
;RUN;
PROC SORT DATA=a; BY famid;
RUN;
PROC SORT DATA=b; BY famid; RUN;
DATA merge121;
 MERGE a(in = a) b(in = b) ;
 BY famid;
 froma = a;
 fromb = b;
RUN;

Import from other format[edit | edit source]

You can import from a CSV:

proc import datafile="E:/Data/recidiv.csv"
out=lib.recidiv replace;
run;


You can also specify the delimiter.

proc import datafile='W:/…/australia2.csv' out=work.australia replace;
	delimiter = ";";
run;

You can import from an xls file.

proc import datafile="C:/Documents and Settings/.../Bureau/base.xls"
out=work.base replace;
sheet = "Table 1" ; 
run;

Export to other format[edit | edit source]

First one can export to Excel :

PROC EXPORT DATA= lib.database OUTFILE= "W:/Desktop/export.xls" 
            DBMS=EXCEL2000 REPLACE;
RUN;

Creating aggregate tables[edit | edit source]

One can create aggregate table using the output delivery system (ODS). The following program creates a table of the cross tabulation and store it in a new dataset.

ods trace on  ; 
ods output CrossTabFreqs = lib.aggregate ; 
ods listing close ; 
proc freq data = lib.ficus (where = (effec < 100)); 
table effec * eff_moy ; 
run ; 
ods output clear  ; 
ods output close ; 
ods trace off ; 
ods listing ;