### Introduction to SAS

```Professional Seminar
Northwestern Polytechnic University
By
Dr. Michael M Cheng
Quiz
Select the following multiple choices.
What is SAS?
a. SAS is a highly contagious disease found in the winter
time in Asia.
b. SAS is sardines and salmon.
c. SAS is a software that compute statistics only.
d. SAS is a 4th generation computer language capable of
performing full feature computer programming.
e. None of the above.
SAS (SAS System)
A computer software system that consists of
several products that provide data retrieval,
management, and analysis capabilities in addition
to programming (SAS Institute, Inc.)
SAS is a problem solving tool.
Heuristic Problem Solving
Image
Mode 1
Linguistic
Mode 1
Image Mode 2
Linguistic Mode 2
The interaction between image mode and linguistic mode is called
Heuristic Problem Solving.
Psychology of Communication
By George Miller
Coding
Decoding
Channel Capacity
Magic number 7 plus or minus 2
For example:
2121568931
Psychology of Communication
By George Miller
Coding
Decoding
Channel Capacity
Magic number 7 plus or minus 2
For example:
??????????
Psychology of Communication
By George Miller
Coding
Decoding
Channel Capacity
Magic number 7 plus or minus 2
For example:
212-156-8931
SAS program source code is composed of many
SAS statements, and some for PROC step, some for
DATA step, and some used in either step.
SAS statements begin with an identifying
keyword and end with a semicolon;
SAS statements are free-format.
A SAS data set is a collection of data values
arranged in a rectangular tables.
The columns in the table are called variables.
The rows in the table are called observations (or
records). There are two kinds of variables:
character variables
number variables
VARIABLES
NAME SEX AGE HEIGHT WEIGHT
---------------------------------------------------------------------------------------------------------observations 1 JOHN
M
12
59.0
99.5
observations 2 JAMES
M
12
57.0
83.5
observations 3 AFLRED
M
14
69.0
112.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
observations 19 ALICE
F
12
56.5
84.0
DATA CLASS;
INPUT NAME \$1-8
SEX
\$11
AGE
13-14
HEIGHT 16-19
WEIGHT 21-25;
CARDS;
data lines
PROC PRINT DATA=CLASS;
PROC MEANS DATA=CLASS;
VARIABLES HEIGHT WEIGHT;
Creating SAS data sets
Raw data
DATA CLASS;
INPUT NAME
SEX
AGE
HEIGHT
WEIGHT
CARDS;
\$1-8
\$11
13-14
16-19
21-25;
CLASS
A listing of the raw data
NAME
JOHN
JAMES
ALFRED
WILLIAM
JEFFREY
RONALD
THOMAS
PHILIP
ROBERT
HENRY
JANET
JOYCE
JUDY
CAROL
JANE
LOUISE
BARBARA
MARY
ALICE
SEX
M
M
M
M
M
M
M
M
M
M
F
F
F
F
F
F
F
F
F
AGE
12
12
14
15
13
15
11
16
12
14
15
15
14
14
12
12
13
15
13
HEIGHT
59.0
57.3
69.0
66.5
62.5
67.0
57.5
72.0
64.8
63.5
62.5
67.0
64.3
62.8
59.8
56.3
65.3
66.5
56.5
WEIGHT
99.5
83.0
112.5
112.0
84.0
133.0
85.0
150.0
128.0
102.5
112.5
133.0
90.0
102.5
84.5
77.0
98.0
112.0
84.0
CARDS;
JOHN
JAMES
ALFRED
WILLIAM
JEFFREY
RONALD
THOMAS
PHILIP
ALFRED
ROBERT
HENRY
JANET
JOYCE
JUDY
CAROL
JANE
LOUISE
BARBARA
MARY
ALICE
/* data lines */
M
12
M
12
M
14
M
15
M
13
M
15
M
11
M
16
M
14
M
12
M
14
F
15
F
15
F
14
F
14
F
12
F
12
F
13
F
15
F
13
59.0
57.3
69.0
66.5
62.5
67.0
57.5
72.0
69.0
64.8
63.5
62.5
67.0
64.3
62.8
59.8
56.3
65.3
66.5
56.5
99.5
83.0
112.5
112.0
84.0
133.0
85.0
150.0
112.5
128.0
102.5
112.5
133.0
90.0
102.5
84.5
77.0
98.0
112.0
84.0
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
NAME
PROC PRINT DATA=CLASS;
SAS
SEX
AGE
HEIGHT
JOHN
JAMES
ALFRED
WILLIAM
JEFFREY
RONALD
THOMAS
PHILIP
ALFRED
HENRY
JANET
JOYCE
JUDY
CAROL
JANE
LOUISE
BARBARA
MARY
ALICE
M
M
M
M
M
M
M
M
M
M
F
F
F
F
F
F
F
F
F
12
12
14
15
13
15
11
16
14
14
15
15
14
14
12
12
13
15
13
59.0
57.3
69.0
66.5
62.5
67.0
57.5
72.0
69.0
63.5
62.5
67.0
64.3
62.8
59.8
56.3
65.3
66.5
56.5
WEIGHT
99.5
83.0
112.5
112.0
84.0
133.0
85.0
150.0
112.5
102.5
112.5
133.0
90.0
102.5
84.5
77.0
98.0
112.0
84.0
PROC MEANS DATA=CLASS;
VARIABLES HEIGHT WEIGHT;
SAS
VARIABLES N
WEIGHT
HEIGHT
MEAN
STANDARD
DEVIATION
19 100.026316 22.7739335
19 62.336842 5.1270752
MINIMUM
VALUE
50.5000000
51.3000000
MAXIMUM
VALUE
150.000000
72.000000
STD ERROR
OF MEAN
5.22469867
1.17623173
THE PROC STEP


The PROC (or PROCEDURE) statement is used to call
a SAS procedure.
SAS procedures are computer programs that: read SAS
data sets, compute statistics, print results, and create
SAS data sets.
For example:
PROC MEANS SUM MAXDEC=2 DATA=CLASS;
PROC CONTENTS DATA=CLASS;
PROC SORT DATA=CLASS; BY SEX
DESCENDING WEIGHT;
Data Transformations
Assignment statement
Assignment statements are used to create new variable and
to modify values of existing variables. SAS evaluates an
expression and assigns the result to a variable.
variable = expression;
i.e. x=1+2;
Example:
1. Read three variables (YEAR, REVENUE, and EXPENSE)
into a SAS data set.
2. Add a variable named INCOME, which is the difference
between REVENUE and EXPENSE.
3. Change the values of YEAR from 2 digits to 4 digits.
DATA PROFITS;
INPUT YEAR REVENUE EXPENSE;
INCOME=REVENUE–EXPENSE;
YEAR = YEAR + 2000;
CARDS;
00 5650 1050
01 6280 1140
PROC PRINT:
SAS
OBS
1
2
YEAR REVENUE EXPENSE INCOME
2000
2001
5650
6280
1050
1140
4600
5140
SAS functions
Selected functions that compute simple statistics.
SUM
MEAN
VAR
MIN
MAX
STD
sum
arithmetic mean
variance
minimum value
maximum value
standard deviation
Example:
Given: Temperature data at a specific location are recorded
every hour on the hour for several days. Each
record in a file represents one day and contains the
date and the 24 recorded temperatures for that date.
Objective: Create a SAS data set that contains the date, the
24 hourly temperatures, the average temperature,
the minimum temperature and the maximum
temperature for each day.
DATA TEMP;
INPUT DATE \$1-7 @11 (T1-T24) (2.);
AVGTEMP=MEAN(OF T1-T24);
MINTEMP=MIN(OF T1-T24);
MAXTEMP=MAX(OF T1-T24);
CARDS;
data lines
program data vector
DATE T1 . . . AVGTEMP MINTEMP MAXTEMP
The RETAIN statement
SAS normally resets all variables in the program data vector to
missing before each execution of the DATA step. A RETAIN
statement can be used to:
- Retain variable values from the last execution of the DATA step
- Give initial values to the valuables.
Example: Accumulate totals and count observations.
RETAIN COUNT 0 TOTAL 0;
INPUT SCORE;
TOTALS=TOTAL+SCORE;
CARDS;
10 5 3 7 . 6 4
PROC PRINT;
program data vector
COUNT TOTAL SCORE
The SUM statement
The SUM statement is a special assignment statement
that accumulates values from one observation to the
next. It retains the values of the created variable and
treats a missing value as zero.
Example: Accumulate totals and count observations.
INPUT SCORE;
COUNT + 1;
TOTALS=TOTAL+SCORE;
CARDS;
10 5 3 7 . 6 4
PROC PRINT;
CONDITIONAL EXECUTION OF SAS STATEMENT
IF-THEN/ELSE Statements
Use of the IF-THEN statement when you want to execute a SAS
Statement conditional on some expression.
Numeric Comparison
IF CODE=1 THEN RESPONSE=‘GOOD’;
IF CODE=2 THEN RESPONSE=FAIR’;
IF CODE=3 THEN RESPONSE=‘POOR;
For efficiency, use ELSE statements.
IF CODE=1 THEN RESPONSE=“GOOD’;
ELSE
IF CODE=2 THEN RESPONSE=‘FAIR’
ELSE
IF CODE=3 THEN RESPONSE=‘POOR”;
Character comparison
DATA CLASS;
INPUT NAME \$SEX \$AGE HEIGHT WEIGHT;
IF SEX=‘M’ THEN SEX=‘MALE’;
ELSE SEX=‘FEMALE’;
CARDS;
Comparison operators
LT
GT
EQ
LE
GE
NE
NL
NG
<
<
=
<=
>=
less than
greater than
equal than
less than or equal to
greater than or equal to
not equal
not less than
not greater than
Logical operators
OR
AND
NOT
l
&
or, either
and
not, negation
DO and END statements
Execution of a DO statement specifies that all statements
between the DO and its matching END statement are to
be executed.
For example:
DATA EMPLOY;
INPUT NAME \$1-8 DEPNO 10-12
COM 14-17 SALARY 19-23;
IF DEPTNO=201 THEN
DO;
DEPT=‘SALES’;
GROSSPAY = COM+SALARY;
END;
ELSE
DO;
GROSSPAY = SALARY;
END;
CARDS;
JOHNSON
MOSSER
LARKIN
GARRETT
201
101
101
201
1500 18000
21000
24000
4800 18000
PROC PRINT output
OBS
1
2
3
4
NAME
JOHNSON
MOSSER
LARKIN
GARRETT
DEPTNO
201
101
101
201
SAS
COM SARLARY
15000 18000
. 21000
. 24000
48000 18000
DEPT GROSSPAY
SALES
SALES
19500
21000
24000
22800
PROC SORT DATA=RATE_A; BY ZIP;
PROC SORT DATA=RATE_B; BY ZIP;
PROC SORT DATA=RATE_C; BY ZIP;
DATA TMTL;
MERGE RATE_A(IN=A) CTL_TBL(IN=B);
BY ZIP;
IF A & B;
DATA TMMR;
MERGE RATE_B(IN=A) CTL_TBL(IN=B);
BY ZIP;
IF A & B;
DATA TMCR;
MERGE RATE_C(IN=A) CTL_TBL(IN=B);
BY ZIP;
IF A & B;
Conclusion
1. SAS is a 4th generation computer language.
2. SAS is a problem solving tool.
3. It makes your life easier (less stressful).
THE END
```