Lect 16: Benchmarks and Performance Metrics

Report
Lect 16: Benchmarks and Performance Metrics
1
Measurement Tools
• Benchmarks, Traces, Mixes
• Cost, delay, area, power estimation
• Simulation (many levels)
 ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental Laws
Maeng Lect 16-2
Marketing Metrics
MIPS = Instruction Count / Time * 106 = Clock Rate / CPI * 106



Machines with different instruction sets ?
Programs with different instruction mixes ?
• Dynamic frequency of instructions
Uncorrelated with performance
MFLOPS = FP Operations / Time * 106


Machine dependent
Often not where time is spent
Normalized:
add,sub,compare,mult
1
divide, sqrt
4
exp, sin, . . .
8
Maeng Lect 16-3
Fallacies and Pitfalls
MIPS is an accurate measure for comparing performance among
computers
 dependent on the instr. set
 varies between programs on the same computer
 can vary inversely to performance
MFLOPS is a consistent and useful measure of performance
 dependent on the machine and on the program
 not applicable outside the floating-point performance
 the set of floating-point ops is not consistent across the machines
Maeng Lect 16-4
Programs to Evaluate Processor Performance
• (Toy) Benchmarks
 10-100 line program
 e.g.: sieve, puzzle, quicksort
• Synthetic Benchmarks
 Attempt to match average frequencies of real workloads
 e.g., Whetstone, dhrystone
• Kernels
 Time critical excerpts of real programs
 e.g., Livermore loops
• Real programs
 e.g., gcc, spice
Maeng Lect 16-5
Types of Benchmarks
• Architectural
 Synthetic mixes: WHETSTONE, DHRYSTONE, ...
• Algorithmic
 LINPACK
• Kernels
 Self contained sub-programs such as PDE without Input/Output
• Production
 Working code for a significant problem
 PERFECT and SPEC
• Workload
Maeng Lect 16-6
Levels of Benchmark Specification
• Problem Statement
 Algorithm + code production
 Reflect more the effort and skill than it does the system capability
• Solution Method
 NASA Ames
 Reflect more the effort and skill than it does the system capability
• Source Language Code
 Performing the same operation
 necessary baseline from which to measure the effectiveness of ‘smart’ compiler options
Maeng Lect 16-7
Benchmarking Games
•
•
•
•
•
Differing configurations used to run the same workload on two systems
Compiler wired to optimize the workload
Workload arbitrarily picked
Very small benchmarks used
Benchmarks manually translated to optimize performance
Maeng Lect 16-8
Common Benchmarking Mistakes
• Only average behavior represented in test workload
• Not ensuring same initial conditions
• “Benchmark engineering”
 particular optimization
 different compilers or preprocessors
 runtime libraries
Maeng Lect 16-9
Benchmarks
• DHRYSTONE




A synthetic benchmark
Non-numeric system-type programming
Contains fewer loops, simpler calculation and more ‘if’ statements
C code
• LINPACK




Argonne National Lab
Solution of linear equations in FORTRAN environment
Solution method and the code levels
Vectorised processors
• SPEC
 Standard Performance Evaluation Corp.
Non-profit group of computer vendors, systems integrators, universities, research
organizations, publishers and consultants throughout the world
* http://www.specbench.org
Maeng Lect 16-10
SPEC
• Groups
 Open Systems Group(OSG)
• CPU committee
• SFS committee : file server benchmarks
• SDM committee: multi-user Unix Commands Benchmarks
 High Performance Group(HPG)]
• SMP, Workstation Clusters, DSM, Vector Processors, ..
 Graphics Performance Characterization Groups(GPC)
• What metrics can be measured?
 CINT 95 and CFP 95
• C : ‘component level’ benchmarks
– the performance of the processor, the memory architecture and the compiler
– “I/O’, networking, or graphics not measured by CINT 95 and CFP 95
• S : ‘system level’ benchmarks
Maeng Lect 16-11
SPEC: System Performance Evaluation Cooperative
First Round 1989
 10 programs yielding a single number
Second Round 1992
 SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs
 VAX-11/780
Third Round 1995
 Single flag setting for all programs; new set of programs “benchmarks useful for 3 years”
 non-baseline, baseline
 SPARCstation 10 Model 40
Fourth Round 1998
 Under development
Maeng Lect 16-12
SPEC First Round
One program: 99% of time in single line of code
New front-end compiler could improve dramatically
800
700
500
400
300
200
100
tomcatv
fpppp
matrix300
eqntott
li
nasa7
doduc
spice
epresso
0
gcc
SPEC Perf
600
Benchmark
Maeng Lect 16-13
CPU95
CINT95








099.go : An internationally ranked go-playing program.
124.m88ksim : A chip simulator for the Motorola 88100 microprocessor.
126.gcc : Based on the GNU C compiler version 2.5.3.
129.compress : A in-memory version of the common UNIX utility.
130.li : Xlisp interpreter.
132.ijpeg : Image compression/decompression on in-memory images.
134.perl : An interpreter for the Perl language.
147.vortex : An object oriented database.
CFP95










101.tomcatv : Vectorized mesh generation.
102.swim : Shallow water equations.
103.su2cor : Monte-Carlo method.
104.hydro2d : Navier Stokes equations.
107.mgrid : 3d potential field.
110.applu : Partial differential equations.
125.turb3d : Turbulence modeling.
141.apsi : Weather prediction.
145.fpppp : From Gaussian series of quantum chemistry benchmarks.
146.wave5 : Maxwell's equations.
Maeng Lect 16-14
CINT95 (written in C)
• SPECint95
 The geometric mean of eight normalized ratios (one for each integer benchmark) when
compiled with aggressive optimization for each benchmark.
• SPECint_base95
 The geometric mean of eight normalized ratios when compiled with conservative optimization
for each benchmark.
• SPECint_rate95
 The geometric mean of eight normalized throughput ratios when compiled with aggressive
optimization for each benchmark.
• SPECint_rate_base95
 The geometric mean of eight normalized throughput ratios when compiled with conservative
optimization for each benchmark.
Maeng Lect 16-15
CFP95 (written in FORTRAN)
• SPECfp95
 The geometric mean of 10 normalized ratios (one for each floating point benchmark) when
compiled with aggressive optimization for each benchmark.
• SPECfp_base95

The geometric mean of 10 normalized ratios when compiled with conservative optimization for
each benchmark.
• SPECfp_rate95
 The geometric mean of 10 normalized throughput ratios when compiled with aggressive
optimization for each benchmark.
• SPECfp_rate_base95
 The geometric mean of 10 normalized throughput ratios when compiled with conservative
optimization for each benchmark.
Maeng Lect 16-16
The Pros and Cons of Geometric Means
•
•
•
•
Independent of the running times of the individual programs
Independent of the reference machines
Do not predict execution time
To focus the attention on the benchmarks where the performance is easiest to improve
 2 sec --> 1 sec , 10000 sec --> 5000 sec
 “crack”, benchmark engineering
Maeng Lect 16-17

similar documents