High Performance Computing - Center for Computation & Technology

Report
Prof. Thomas Sterling
Department of Computer Science
Louisiana State University
January 18, 2011
HIGH PERFORMANCE COMPUTING:
MODELS, METHODS, & MEANS
AN INTRODUCTION
CSC 7600 Lecture 1 : Introduction
Spring 2011
Aerial & Satellite of Hurricane Katrina
CSC 7600 Lecture 1 : Introduction
Spring 2011
2
Devastation from Hurricane Katrina
CSC 7600 Lecture 1 : Introduction
Spring 2011
3
Simulating Katrina
CSC 7600 Lecture 1 : Introduction
Spring 2011
Evolution of HPC
1949
Edsac
1
One OPS
1823
Babbage Difference
Engine
1943
Harvard
Mark 1
1959
IBM 7094
1976
Cray 1
1991
Intel Delta
1996 2003
2009
T3E Cray X1 Cray XT5
103
106
109
1012
KiloOPS
MegaOPS
GigaOPS
TeraOPS
1951
Univac 1
1964
CDC 6600
1982
Cray XMP
1988
Cray YMP
1015
PetaOPS
2001
2006
1997
ASCI Red Earth BlueGene/L
Simulator
CSC 7600 Lecture 1 : Introduction
Spring 2011
5
New Fastest Computer in the World
DEPARTMENT OF COMPUTER SCIENCE @
LOUISIANA STATE UNIVERSITY
6
CSC 7600 Lecture 1 : Introduction
Spring 2011
2nd Fastest Computer in the World
Jaguar (Cray XT5-HE)
• Owned by Oak Ridge National Laboratory
• Breaks Petaflops processing barrier(1.759e+15 flops)
• Contains 224,162 AMD x86_64 Opteron Six Core 2600
MHz chips
CSC 7600 Lecture 1 : Introduction
Spring 2011
7
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
8
Synergy Drives Supercomputing Evolution
• Technology
– Enables digital technology
– Defines balance of capabilities
– Establishes relationship of relative costs
• Architecture
–
–
–
–
Creates interface between computation and technology
Determines structures of technology-based components
Establishes low-level semantics of operation
Provides low-cost mechanisms
• Model of Computation
– Paradigm by which computation is manifest
– Provides governing principles of architecture operation
– Implies programming model and languages
CSC 7600 Lecture 1 : Introduction
Spring 2011
9
Where Does Performance Come From?
• Device Technology
– Logic switching speed and device density
– Memory capacity and access time
– Communications bandwidth and latency
• Computer Architecture
– Instruction issue rate
•
•
•
•
Execution pipelining
Reservation stations
Branch prediction
Cache management
– Parallelism
• Parallelism – number of operations per cycle
per processor
– Instruction level parallelism (ILP)
– Vector processing
• Parallelism – number of processors per node
• Parallelism – number of nodes in a system
CSC 7600 Lecture 1 : Introduction
Spring 2011
10
Major Technology Generations
(dates approximate)
• Electromechanical
– 19th century through 1st half of 20th century
• Digital electronic with vacuum tubes
– 1940s
• Core memory
– 1950
• Transistors
– 1947
• SSI & MSI RTL/DTL/TTL semiconductor
– 1970
• DRAM
– 1970s
• CMOS VLSI
– 1990
• Multicore
– 2006
CSC 7600 Lecture 1 : Introduction
Spring 2011
11
The SIA ITRS Roadmap
100,000
M B per D R A M C hip
Lo g ic Tra nsisto rs per C hip ( M )
uP C lo ck (M Hz)
10,000
1,000
100
2012
2009
2006
2003
2001
1999
1
1997
10
Year of Technology Availability
CSC 7600 Lecture 1 : Introduction
Spring 2011
12
Classical DRAM
• Memory mats: ~ 1 Mbit each
• Row Decoders
• Primary Sense Amps
• Secondary sense amps & “page” multiplexing
• Timing, BIST, Interface
• Kerf
1.00
1000
0.90
100
0.80
% Chip Overhead
Gbits per chip
10
1
0.1
0.01
0.001
0.70
0.60
0.50
0.40
0.30
0.20
0.0001
0.10
0.00001
0.00
0.000001
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Historical
ITRS @ Production
ITRS @ Introduction
Density/Chip has dropped below 4X/3yrs
1970
1980
Historical
1990
2000
SIA Production
2010
2020
SIA Introduction
And 45% of Die is Non-Memory
CSC 7600 Lecture 1 : Introduction
Spring 2011
13
Peak Logic Clock Rates
100,000
100000
Clock (MHz)
3 GHz
3 GHz
1000
1,000
Clock (MHz)
10000
10,000
100
100
10
10
1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
Historical
ITRS Max Clock Rate (12 invertors)
10000
1000
100
10
Feature Size
Historical
ITRS Max
2005 projection was for 5.2 GHz – and we didn’t make it in production.
Further, we’re still stuck at 3+GHz in production.
CSC 7600 Lecture 1 : Introduction
Spring 2011
14
Classes of Architecture for
High Performance Computers
•
Parallel Vector Processors (PVP)
– NEC Earth Simulator, SX-6
– Cray- 1, 2, XMP, YMP, C90, T90, X1
– Fujitsu 5000 series
•
Massively Parallel Processors (MPP)
–
–
–
–
•
Intel Touchstone Delta & Paragon
TMC CM-5
IBM SP-2 & 3, Blue Gene/Light
Cray T3D, T3E, Red Storm/Strider
Distributed Shared Memory (DSM)
– SGI Origin
– HP Superdome
•
Single Instruction stream Multiple Data stream
(SIMD)
– Goodyear MPP, MasPar 1 & 2, TMC CM-2
•
Commodity Clusters
– Beowulf-class PC/Linux clusters
– Constellations
– HP Compaq SC, Linux NetworX MCR
CSC 7600 Lecture 1 : Introduction
Spring 2011
15
Top 500 : System Architecture
CSC 7600 Lecture 1 : Introduction
Spring 2011
16
Driving Issues/Trends
• Multicore
– Now: 8, AMD Opterons, Intel Xeon
– possibly 100’s
– will be million-way parallelism
• Heterogeneity
– GPGPU
– Clearspeed
– Cell SPE
• Component I/O Pins
– Off chip bandwidth not increasing with demand
• Limited number of pins
• Limited bandwidth per pin (pair)
– Cache size per core may decline
– Shared cache fragmentation
• System Interconnect
– Node bandwidth not increasing proportionally
to core demand
• Power
– Mwatts at the high end = millions of $s per year
CSC 7600 Lecture 1 : Introduction
Spring 2011
17
Multi-Core
•
Motivation for Multi-Core
–
–
–
–
•
Exploits improved feature-size and density
Increases functional units per chip (spatial efficiency)
Limits energy consumption per operation
Constrains growth in processor complexity
Challenges resulting from multi-core
– Relies on effective exploitation of multiple-thread
parallelism
• Need for parallel computing model and parallel
programming model
– Aggravates memory wall
• Memory bandwidth
– Way to get data out of memory banks
– Way to get data into multi-core processor array
• Memory latency
• Fragments L3 cache
– Pins become strangle point
• Rate of pin growth projected to slow and flatten
• Rate of bandwidth per pin (pair) projected to grow slowly
– Requires mechanisms for efficient inter-processor
coordination
• Synchronization
• Mutual exclusion
• Context switching
CSC 7600 Lecture 1 : Introduction
Spring 2011
18
Heterogeneous Multicore Architecture
• Combines different types of processors
– Each optimized for a different operational modality
• Performance > nX better than other n processor types
– Synthesis favors superior performance
• For complex computation exhibiting distinct modalities
• Conventional co-processors
– Graphical processing units (GPU)
– Network controllers (NIC)
– Efforts underway to apply existing special purpose
components to general applications
• Purpose-designed accelerators
– Integrated to significantly speedup some critical aspect
of one or more important classes of computation
– IBM Cell architecture
– ClearSpeed SIMD attached array processor
CSC 7600 Lecture 1 : Introduction
Spring 2011
19
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
20
Definitions: “supercomputer”
Supercomputer: A computing system exhibiting high-end performance
capabilities and resource capacities within practical constraints of technology,
cost, power, and reliability. Thomas Sterling, 2007
Supercomputer: a large very fast mainframe used especially for
scientific computations. Merriam-Webster Online
Supercomputer: any of a class of extremely powerful computers. The term is
commonly applied to the fastest high-performance systems available at any given time.
Such computers are used primarily for scientific and engineering work requiring
exceedingly high-speed computations. Encyclopedia Britannica Online
CSC 7600 Lecture 1 : Introduction
Spring 2011
21
Moore’s Law
Moore's Law describes a longterm trend in the history of
computing hardware, in which
the number of transistors that
can be placed inexpensively on
an integrated circuit has doubled
approximately every two years.
CSC 7600 Lecture 1 : Introduction
Spring 2011
22
Top 500 List
CSC 7600 Lecture 1 : Introduction
Spring 2011
23
Performance
• Performance:
– A quantifiable measure of rate of doing (computational) work
– Multiple such measures of performance
• Delineated at the level of the basic operation
– ops – operations per second
– ips – instructions per second
– flops – floating operations per second
• Rate at which a benchmark program takes to execute
–
–
–
–
A carefully crafted and controlled code used to compare systems
Linpack Rmax (Linpack flops)
gups (billion updates per second)
others
• Two perspectives on performance
– Peak performance
• Maximum theoretical performance possible for a system
– Sustained performance
• Observed performance for a particular workload and run
• Varies across workloads and possibly between runs
CSC 7600 Lecture 1 : Introduction
Spring 2011
24
Scalability
•
•
The ability to deliver proportionally greater sustained performance through
increased system resources
Strong Scaling
– Fixed size application problem
– Application size remains constant with increase in system size
•
Weak Scaling
– Variable size application problem
– Application size scales proportionally with system size
•
Capability computing
– in most pure form: strong scaling
– Marketing claims tend toward this class
•
Capacity computing
– Throughput computing
• Includes job-stream workloads
– In most simple form: weak scaling
•
Cooperative computing
– Interacting and coordinating concurrent processes
– Not a widely used term
– Also: coordinated computing
CSC 7600 Lecture 1 : Introduction
Spring 2011
25
Machine Parameters affecting Performance
•
•
•
•
•
•
Peak floating point performance
Main memory capacity
Bi-section bandwidth
I/O bandwidth
Secondary storage capacity
Organization
–
–
–
–
–
Class of system
# nodes
# processors per node
Accelerators
Network topology
• Control strategy
–
–
–
–
MIMD
Vector, PVP
SIMD
SPMD
CSC 7600 Lecture 1 : Introduction
Spring 2011
26
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
27
A Brief History of Supercomputing
• Mechanical Computing
– Babbage, Hollerith, Aiken
• Electronic Digital Calculating
– Atanasoff, Eckert, Mauchly
• von Neumann Architecture
– Turing, von Neumann, Eckert, Mauchly, Foster, Wilkes
• Semiconductor Technologies
• Birth of the Supercomputer
– Cray, Watanabe
• The Golden Age
– Batcher, Dennis, S. Chen, Hillis, Dally, Blank, B. Smith
• Common Era of Killer Micros
– Scott, Culler, Sterling/Becker, Goodhue, A. Chen, Tomkins
• Petaflops
– Messina, Sterling, Stevens, P. Smith,
CSC 7600 Lecture 1 : Introduction
Spring 2011
28
Practical Constraints and Limitations
•
Cost
– Deployment
– Operational support
•
Power
– Energy required to run the computer
– Energy for support facilities
– Energy for cooling (remove heat from
machine)
•
Size
– Floor space
– Access way for power and signal cabling
•
Reliability
– One factor of availability
•
Generality
– How good is it across a range of problems
•
Usability
– How hard is it to program and manage
CSC 7600 Lecture 1 : Introduction
Spring 2011
29
Historical Machines
•
•
•
•
•
•
•
•
•
•
•
•
Leibniz Stepped Reckoner
Babbage Difference Engine
Hollerith Tabulator
Harvard Mark 1
Un. of Pennsylvania Eniac
Cambridge Edsac
MIT Whirlwind
Cray 1
TMC CM-2
Intel Touchstone Delta
Beowulf
IBM Blue Gene/L
CSC 7600 Lecture 1 : Introduction
Spring 2011
30
Golden Age of Parallel Architecture
• 1975 – 1992
• Vector
– Cray-1&2, NEC SX,
Fujitsu VPP
1976
Cray 1
• SIMD
– Maspar, CM-2
• Systolic
– Warp
• Dataflow
– Manchester, Sigma,
Monsoon
• Multithreaded
– HEP, MTA
• Actor-based
– J-Machine
CSC 7600 Lecture 1 : Introduction
Spring 2011
31
Dark Ages of Parallel Computing
Technology drivers
•
•
•
•
•
1992 to present
Killer Micro and mass market
PCs
High density DRAM
High cost of fab lines
CSP
– Message passing
•
•
•
Economy of scale S-curve
MPP
Weak scaling
– Gustafson et al
•
•
•
•
Beowulf, NOW Clusters
MPI
Ethernet, Myrinet
Linux
CSC 7600 Lecture 1 : Introduction
Spring 2011
32
Supercomputer Points of Transition
• Automated calculating
– 17th century
• Stored program digital electronic
– 1948
• Vector
– 1975
• SIMD
– 1980s
• MPPs
– 1991
• Commodity Clusters
– 1993/4
• Multicore
– 2006
CSC 7600 Lecture 1 : Introduction
Spring 2011
33
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
34
Driving Factors for HPC
• Technology trends
– Multicore components
– Heterogeneous structures and accelerators
• The 4 Horsemen of the Apocalypse (SLOW)
–
–
–
–
Starvation (sufficient parallelism and load balancing)
Latency (idle time due to round trip delays)
Overhead (critical path support mechanisms)
Waiting for contention (inadequate bandwidth)
• Reliability
– Single point failure modes cannot be tolerated
– Reduced feature size and increased component count
• Power consumption
– Just too much!
– Dominating practical growth in mission critical domains
• Changing application workload characteristics
– Data (meta-data) intensive for sparse numerics and symbolics
• Programmability & ease of use
– System complexity, scale and dynamics defy optimization by
hand
CSC 7600 Lecture 1 : Introduction
Spring 2011
35
Sources of Performance Degradation
(SLOW)
• Starvation
– Not enough work to do due to insufficient parallelism or poor load
balancing among distributed resources
• Latency
– Waiting for access to memory or other parts of the system
• Overhead
– Extra work that has to be done to manage program concurrency
and parallel resources the real work you want to perform
• Waiting for Contention
– Delays due to fighting over what task gets to use a shared
resource next. Network bandwidth is a major constraint.
CSC 7600 Lecture 1 : Introduction
Spring 2011
36
The Memory Wall
0
0
Ratio
0
Tim e (n s )
Memory Access Time
1
0
1
0
0
0
0
4
0
0
3
0
0
2
0
0
1
0
0
M e m o ry t o CPU Ra t io
1
5
CPU Time
1
0
0
.
1
1
9
91
79
92
90
02
X
C
M
P
e
U
m
10
02
x
i s
l o
r y
c
S
- A
C
o
30
02
60
0
9
R
k
y
a P t ei o r i o
d
s t e
m
A
c
THE WALL
CSC 7600 Lecture 1 : Introduction
Spring 2011
37
(
c
Microprocessors no longer realize the
full potential of VLSI technology
1e+7
1e+6
Perf (ps/Inst)
1e+5
Linear (ps/Inst)
1e+4
1e+3
1e+2
30:1
1e+1
1,000:1
1e+0
30,000:1
1e-1
1e-2
1e-3
1e-4
1980
1990
2000
2010
2020
CSC 7600 Lecture 1 : Introduction
Spring 2011
38
Amdahl’s Law
TO
start
end
TA
TF
start
end
TF/g
TO º time for non-accelerated computation
TA º time for accelerated computation
TF º time of portion of computation that can be accelerated
g º peak performance gain for accelerated portion of computation
f º fraction of non-accelerated computation to be accelerated
S º speed up of computation with acceleration applied
S = TO TA
f = TF TO
æfö
TA = (1- f ) ´ TO + ç ÷ ´ TO
ègø
TO
S=
æfö
(1- f ) ´ TO + ç ÷ ´ TO
ègø
1
S=
æfö
1- f + ç ÷
ègø
CSC 7600 Lecture 1 : Introduction
Spring 2011
39
Amdahl’s Law with Overhead
TO
start
end
tF
TA tF
tF
tF
n
start
end
v + tF/g
TF   tFi
i
v  overheadof accelerat ed work segment
n
V  t ot aloverheadfor accelerat ed work   vi
i
TA  1  f  TO 
f
 TO  n  v
g
TO
TO
S

TA 1  f  TO  f  TO  n  v
g
1
S
1  f   f  n  v
g
TO
CSC 7600 Lecture 1 : Introduction
Spring 2011
40
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
41
Supercomputing System Stack
• Device technologies
– Enabling technologies for logic, memory, & communication
– Circuit design
• Computer architecture
– semantics and structures
• Models of computation
– governing principles
• Operating systems
– Manages resources and provides virtual machine
• Compilers and runtime software
– Maps application program to system resources, mechanisms, and
semantics
• Programming
– languages, tools, & environments
• Algorithms
– Numerical techniques
– Means of exposing parallelism
• Applications
– End user problems, often in sciences and technology
CSC 7600 Lecture 1 : Introduction
Spring 2011
42
Topics
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview – Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
43
Addressing the Big Questions
• How to integrate technology into computing engines?
• How to push the performance to extremes?
– What are the enabling conditions?
– What are the inhibiting factors?
• How to manage supercomputer resources to deliver useful
computing capabilities?
– What are the hardware mechanisms?
– What are the software policies?
• How do users program such systems?
– What languages and in what environments?
– What are the semantics and strategies?
• What grand challenge applications demand these capabilities?
• What are the computational models and algorithms that can map the
innate application properties to the physical medium of the
machine?
CSC 7600 Lecture 1 : Introduction
Spring 2011
44
Goals of the Course
• A first overview of the entire field of HPC
• Basic concepts that govern the capability and
effectiveness of supercomputers
• Techniques and methods for applying HPC systems
• Tools and environments that facilitate effective
application of supercomputers
• Hands-on experience with widely used systems and
software
• Performance measurement methods, benchmarks,
and metrics
• Practical real-world knowledge about the HPC
community
• Access by students outside the HPC mainstream
CSC 7600 Lecture 1 : Introduction
Spring 2011
45
Student Objectives
•
•
•
•
Computational Scientist
HPC researcher
System Administrators
Design Engineers
CSC 7600 Lecture 1 : Introduction
Spring 2011
46
Course Overview: Multiple Segments
• Introduction
–
–
–
–
–
An Overview
Parallel Computer Architecture
Commodity Clusters
Benchmarking
Throughput Computing
• Distributed Memory - MPI
– Communicating sequential
processes (CSP)
– Enabling Technologies - Networks
– MPI programming
– Performance measurement (2)
• Shared Memory – OpenMP
• System Software
– Operating Systems
– Schedulers and Middleware
– Parallel file I/O
• Advanced Techniques
⁻
⁻
⁻
Visualization
Parallel Algorithms
HPC Libraries
• Conclusions
– What’s beyond the scope of this course
– What form will the future of HPC take
– Single Node Architecture
– Enabling Technologies – Memory,
Core Architectures,..
– Parallel thread computing
– OpenMP programming
– Performance factors and
measurement (1)
CSC 7600 Lecture 1 : Introduction
Spring 2011
47
Introduction & Throughput Computing
January
February
Tu
18
Introduction
Th 20
Parallel Computer Architecture, Quiz1
Tu
Commodity Cluster
25
Th 27
Benchmarking, Quiz2
Tu
Throughput Computing
1
*Project walkthroughs will be held during
office hours.
CSC 7600 Lecture 1 : Introduction
Spring 2011
48
Distributed Memory & MPI
Th
3
CSP / Parallelism, Quiz3
Tu
8
MPI 1
Th
10
MPI 2 / Performance Measurement (TAU), Quiz4
Tu
15
Shared Memory / Parallelization, Sample Project Overview
*Project walkthroughs will be held during
office hours.
CSC 7600 Lecture 1 : Introduction
Spring 2011
49
Shared Memory & OpenMP
March
Th
17
Enabling Technologies -(memory, architecture,
multicore, cache coherence) , Quiz5
Tu
22
Pthreads
Th
24
OpenMP , Quiz6
Tu
1
Performance Measurement (PAPI…)
Th
3
Visualization, Quiz7, Project Abstract Due
Tu
Th
8
10
Mardi Gras Holidays
Parallel Algorithms 1, Quiz8
*Project walkthroughs will be held during
office hours.
CSC 7600 Lecture 1 : Introduction
Spring 2011
50
Advanced Techniques
April
Th
Tu
17
22
Parallel Algorithms 2, Quiz9
Parallel Algorithms 3, Project Walkthroughs*
Th
Tu
Th
24
29
31
Parallel Algorithms 4, Project Walkthroughs*, Quiz10
Libraries 1
Libraries 2, Quiz11
Tu
Th
Tu
Th
5
7
12
14
Parallel File I/O 1
Parallel File I/O 2, Quiz12
Operating Systems 1
Operating Systems 2, Quiz13
*Project walkthroughs will be held during
office hours.
CSC 7600 Lecture 1 : Introduction
Spring 2011
51
System Software
May
Tu
19
Spring Break
Th
21
Spring Break
Tu
26
Scheduling / Workload Management Systems
Th
28
Checkpointing/System Administration, Project Due, Quiz14
Tu
Th
3
5
Beyond and Beyond
Class Summary / Final Exam Review
Th
12
FINAL EXAM (7:30 – 9:30 AM)
*Project walkthroughs will be held during
office hours.
CSC 7600 Lecture 1 : Introduction
Spring 2011
52
Topics
•
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Demo 1 : Performance Scalability
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
53
Course Website
• HPC Course Website can be accessed at:
http://www.cct.lsu.edu/csc7600
• Course Info:
– Syllabus
– Schedule
• Contact Information in the (People Section):
email, IM, Phone etc.
• All course announcements will be made via email and Website.
• Lecture Slides will be made available on the course website (Course
Material Section)
• Videos of Lectures will be made available on the course website
(Course Material Section) after every lecture.
CSC 7600 Lecture 1 : Introduction
Spring 2011
54
Contact Information
Prof. Thomas Sterling
[email protected]
(225) 578-8982 (CCT Office)
Johnston Hall 320, (225) 578-3320
Office Hours: Tu(1:00 - 3:00 PM) & Th(9:00 – 10:00 AM)
Teaching Assistant:
Daniel Kogler
[email protected]
Office Hours : Johnston 318
Tuesday 1:40 – 3:00 PM
Thursday 9:00 – 10:00 AM
Course Secretary :
Ms. Terrie Bordelon
[email protected]
302 Johnston Hall
(225) 578-5979
CSC 7600 Lecture 1 : Introduction
Spring 2011
55
Grading Policy
Grading Policy for Graduate Students :
•
•
•
•
•
Midterm – 20 %
Final – 30 %
Problem Sets – 25 %
Quizzes – 5 %
Project – 20 %
Grading Policy for Under-Graduate Students :
•
•
•
•
Midterm – 30 %
Final – 35 %
Problem Sets – 30 %
Quizzes – 5 %
CSC 7600 Lecture 1 : Introduction
Spring 2011
56
Assignments
• There will be adequately portioned assignments during
this course.
– Assignments should be turned in as PRINTOUTS to the TA the following
TUESDAY BEFORE CLASS.
– Assignments should be turned in WORD format / PDF format. NO
handwritten assignments will be accepted.
– Assignments involving programming problems should have source code
printed and attached, and all solution relevant materials (e.g. PBS scripts,
commands used for performance measurement etc…) must be well
documented and attached.
– Source code and all relevant files for assignments involving programming
assignments needs to be submitted according to the guidelines
mentioned in each problem-set and is due the same time as the
assignment (late policy for source code submissions is the same as that
of assignments).
CSC 7600 Lecture 1 : Introduction
Spring 2011
57
Assignments
• LATE POLICY:
– All assignments should be turned in on the due date BEFORE the
CLASS.
– Assignments turned in on the same day by 5 PM (Central) will incur a
penalty of 30% of the assignment grade.
– Assignments turned in BEYOND 5PM (Central) of the due date will
receive 0 points irrespective of the work quality.
• IMPORTANT :
– Most of the assignments will need to be run on local
supercomputing resources that are shared among several users.
– Jobs that you submit WILL get stuck in a queue.
– “Queue ate my homework” is NOT an acceptable excuse for not
turning homework in.
– You are strongly encouraged to start working on assignments as
and when they are assigned to avoid inevitable queue wait times.
CSC 7600 Lecture 1 : Introduction
Spring 2011
58
Graduate Student Projects
• Term projects are required for Graduate Students
• Sample Topics
–
–
–
–
Parallel Image Processing
Application performance measurement
Advanced visualization techniques
Parallel Programming
• LATE POLICY:
– Abstracts turned in later than the assigned date will incur
an overall project penalty of 5%
– Walkthroughs done later than the assigned date will incur
a overall project penalty of 15%
– Projects turned in later than the assigned date will NOT
be considered for grading and will have an automatic
score of 0.
CSC 7600 Lecture 1 : Introduction
Spring 2011
59
Graduate Student Project Topics
• Application Scaling : detailed analysis & performance
profiling of application(s) based on parameters such as
number of processors, application performance
bottlenecks, etc..
• Application Development : design and develop new
parallel applications with simple performance profiling
analysis.
• Architecture Comparative Studies: alternative networks,
processors, accelerators
CSC 7600 Lecture 1 : Introduction
Spring 2011
60
Reference Material
• No Required Textbook
• Lecture notes (slides), required reading lists
(URLs) provided at the end of lectures, some
additional notes (on web site), and assignments
would be primary sources of material for exams.
• Students are strongly encouraged to pursue
additional reading material available on the
internet (and as part of projects).
CSC 7600 Lecture 1 : Introduction
Spring 2011
61
DEMO: Computing Resources Overview
presented by Adam Yates
CSC 7600 Lecture 1 : Introduction
Spring 2011
62
Computing Resources
Arete [arete.cct.lsu.edu]
• 64 compute nodes x 8 cores
• Quad-core AMD Opteron Processor @ 2.4 Ghz
• 8 GB RAM per Node
• 24TB of shared storage
• 1GB ethernet network interface
• 10GB Infiniband interconnect
CSC 7600 Lecture 1 : Introduction
Spring 2011
63
Plagiarism
• The LSU Code of Student Conduct defines plagiarism in Section
5.1.16:
–
"Plagiarism is defined as the unacknowledged inclusion of someone else's words, structure, ideas, or
data. When a student submits work as his/her own that includes the words, structure, ideas, or data
of others, the source of this information must be acknowledged through complete, accurate, and
specific references, and, if verbatim statements are included, through quotation marks as well. Failure
to identify any source (including interviews, surveys, etc.), published in any medium (including on the
internet) or unpublished, from which words, structure, ideas, or data have been taken, constitutes
plagiarism;“
• Plagiarism will not be tolerated and will be dealt with in
accordance with and as outlined by the LSU Code of Student
Conduct :
http://appl003.lsu.edu/slas/dos.nsf/$Content/Code+of+Conduct?
OpenDocument
CSC 7600 Lecture 1 : Introduction
Spring 2011
64
Topics
•
•
•
•
•
•
•
•
•
•
•
HPC Applications
Supercomputing : An Enabler
Architecture, Technologies, Programming Models
Performance oriented theme
Demo 1 : Performance Scalability
Brief History of HPC
Sources of Performance Degradation
Supercomputer System Stack
Course Overview - Goals & Content
Course Administration
Summary Materials for Test
CSC 7600 Lecture 1 : Introduction
Spring 2011
65
Summary Materials for Test
•
•
•
•
•
•
•
Defining Supercomputer – slide 21
Performance Issues in HPC – slide 24
Scalability – slide 25
Machine parameters affecting performance – slide 26
Driving factors for HPC – slide 35
Sources of performance degradation – slide 36
Supercomputing system stack – slide 42
CSC 7600 Lecture 1 : Introduction
Spring 2011
66
CSC 7600 Lecture 1 : Introduction
Spring 2011
ENIAC
(Electronic Numerical Integrator and Computer )
•
Eckert and Mauchly,
1946.
• Vacuum tubes.
• Numerical solutions to
problems in fields such
as atomic energy and
ballistic trajectories.
CSC 7600 Lecture 1 : Introduction
Spring 2011
68
EDSAC
(Electronic Delay Storage Automatic Calculator)
• Maurice Wilkes, 1949.
• Mercury delay lines for
memory and vacuum
tubes for logic.
• Used one of the first
assemblers called Initial
Orders.
• Calculation of prime
numbers, solutions of
algebraic equations,
etc.
CSC 7600 Lecture 1 : Introduction
Spring 2011
69
MIT Whirlwind
• Jay Forrester, 1949.
• Fastest computer.
• First computer to use
magnetic core memory.
• Displayed real time text
and graphics on a large
oscilloscope screen.
CSC 7600 Lecture 1 : Introduction
Spring 2011
70
CRAY-1
• Cray Research, 1976.
• Pipelined vector
arithmetic units.
• Unique C-shape to help
increase the signal
speeds from one end to
the other.
CSC 7600 Lecture 1 : Introduction
Spring 2011
71
CM-2
• Thinking Machines
Corporation, 1987.
• Hypercube architecture
with 65,536 processors.
• SIMD.
• Performance in the
range of GFLOPS.
CSC 7600 Lecture 1 : Introduction
Spring 2011
72
INTEL Touchstone Delta
• INTEL, 1990.
• MIMD hypercube.
• LINPACK rating of 13.9
GFLOPS .
• Enough computing
power for applications
like real-time
processing of satellite
images and molecular
models for AIDS
research.
CSC 7600 Lecture 1 : Introduction
Spring 2011
73
Beowulf
• Thomas Sterling and
Donald Becker, 1994.
• Cluster formed of one
head node and one/more
compute nodes.
• Nodes and network
dedicated to the Beowulf.
• Compute nodes are
mass produced
commodities.
• Use open source
software including Linux.
CSC 7600 Lecture 1 : Introduction
Spring 2011
74
Earth Simulator
• Japan, 1997.
• Fastest supercomputer
from 2002-2004: 35.86
TFLOPS.
• 640 nodes with eight
vector processors and
16 gigabytes of
computer memory at
each node.
CSC 7600 Lecture 1 : Introduction
Spring 2011
75
BlueGene/L
• IBM, 2004.
• First supercomputer
ever to run over 100
TFLOPS sustained on a
real world application,
namely a threedimensional molecular
dynamics code
(ddcMD).
CSC 7600 Lecture 1 : Introduction
Spring 2011
76
CSC 7600 Lecture 1 : Introduction77
Spring 2011

similar documents