PROC UNIVARIATE vs. PROC SUMMARY

Report
PROC UNIVARIATE vs.
PROC SUMMARY
A Comparison of Performance
Background
• For many of the common things I do, PROCs
UNIVARIATE and SUMMARY can accomplish
similar results
• Many years ago, someone suggested I use PROC
UNIVARIATE because it had more functions
• They claimed that both procedures performed
about the same
– I didn’t bother to check that out
• Unless I needed something that could be done
only with PROC SUMMARY, I got in the habit of
using PROC UNIVARIATE
More Background
• Several months ago, I was becoming
frustrated with how long it was taking to run
some large PROC UNIVARIATEs for simple
functions (like SUM, MEAN, MIN, MAX, etc.)
– It also was using a lot of CPU
• There had to be a better way
My First Experiment
• Wrote DATA steps to do simple functions
• Benchmarked the DATA steps again PROC
UNIVARIATE steps
• Compared output results to ensure integrity
• Ran tests using SAS on both Mainframe and
PC
• The results were surprising
Elapsed Time
PROC UNIVARIATE vs. DATA Step
2 Columns with SUM, MIN, MAX
5,000
4,500
4,000
3,500
Seconds
3,000
2,500
2,000
1,500
1,000
500
0
MF-01
MF-02
PROC Univariate Elapsed
MF-03
Data Step Elapsed
PC-01
PC-02
CPU Time
PROC UNIVARIATE vs. DATA Step
2 Columns with SUM, MIN, MAX
3,500
3,000
CPU Seconds
2,500
2,000
1,500
1,000
500
0
MF-01
MF-02
MF-03
PROC Univariate CPU
Data Step CPU
PC-01
PC-02
Results of First Test
• Data step showed:
– 95% reduction in elapsed time
– 99% reduction in CPU time
• Decided to also run tests comparing PROC
SUMMARY
Elapsed Time
PROC UNIVARIATE vs. DATA Step and PROC SUMMARY
2 Columns with SUM, MIN, MAX
5,000
4,500
4,000
3,500
Seconds
3,000
2,500
2,000
1,500
1,000
500
0
MF-01
MF-02
PROC Univariate Elapsed
MF-03
Data Step Elapsed
PC-01
PROC Summary Elapsed
PC-02
CPU Time
PROC UNIVARIATE vs. DATA Step and PROC SUMMARY
2 Columns with SUM, MIN, MAX
3,500
3,000
CPU Seconds
2,500
2,000
1,500
1,000
500
0
MF-01
MF-02
PROC Univariate CPU
MF-03
Data Step CPU
PC-01
PROC Summary CPU
PC-02
Results of First Test
• Compared to PROC UNIVARIATE, PROC
SUMMARY showed:
– 94% reduction in elapsed time
– 96% reduction in CPU time
Overall Test Results
• Ran many tests on several types of data
• Data Step vs. PROC UNIVARIATE
– Elapsed time was 71% to 95% lower
– CPU was 74% - 99% lower
• PROC SUMMARY vs. PROC UNIVARIATE
– Elapsed time was 72% to 94% lower
– CPU was 76% - 96% lower
• In tests where PROC MEANS was also run, results
were similar to PROC SUMMARY
– Sometimes a little less CPU and elapsed time,
sometimes a little more
Other Observations
• Data steps performed slightly better then PROCs
SUMMARY and MEANS for simple functions but not as
good on more complex functions
• Most tests were run on both mainframe and PC
– Elapsed time and CPU improvement percentages (vs.
PROC UNIVARIATE) were usually similar on both platforms
• The tests were run on an older, slower mainframe and
a new Windows 7 PC
– For each test, the same data and parameters were run on
both the mainframe and PC
• The PC generally ran 80-95 percent faster than the same tests on
the mainframe (for tested functions) and used 85-95 per less CPU

similar documents