asplos2013_panel - University of Wisconsin–Madison

Report
st
21
Research Directions for
Century Computer Systems
ASPLOS 2013 Panel
0. Mark Hill: Introduction
Impact? $15M NSF XPS
(Exploiting Parallelism &
Scalability) cites 1 & 4.
1. Kathryn McKinley on NAS Report
The Future of Computing Performance: Game Over or Next Level?
2. Josep Torrellas on CCC Workshops
Advancing Computer Architecture Research (ACAR)
3. Mark Hill on ISAT Workshop
Advancing Computer Systems without Technology Progress
Q: Do to facilitate,
4. Sarita Adve on CCC White Paper
21st Century Computer Architecture transcend, or refute
5. Emmett Witchel unbounded
these partially
overlapping visions?
The Future
of Computing
Performance:
Game Over or Next Level?
Samuel H. Fuller, Chair
March 22, 2011
Computer Science and Telecommunications Board (CSTB)
National Research Council (NRC)
Thanks to Sam Fuller & Mark Hill
Committee On Sustaining Growth In Computing
Performance
Experts Addressed the Problem
•
•
•
•
•
•
•
•
•
•
SAMUEL H. FULLER, Analog Devices Inc., Chair
LUIZ ANDRÉ BARROSO, Google, Inc.
ROBERT P. COLWELL, Independent Consultant
WILLIAM J. DALLY, NVIDIA Corporation and Stanford University
DAN DOBBERPUHL, PA Semi/Apple
PRADEEP DUBEY, Intel Corporation
MARK D. HILL, University of Wisconsin–Madison
MARK HOROWITZ, Stanford University
DAVID KIRK, NVIDIA Corporation
MONICA LAM, Stanford University
• KATHRYN S. McKINLEY, University of Texas at Austin
•
•
CHARLES MOORE, Advanced Micro Devices
KATHERINE YELICK, University of California, Berkeley
Staff
•
•
LYNETTE I. MILLETT, Study Director
SHENAE BRADLEY, Senior Program Assistant
3
Executive Summary
1. Computer hardware has transitioned to multicore
2. Dennard scaling of CMOS has broken down
3. Parallelism and locality must be exploited by
software
4. Chip power will soon limit multicore scaling
Virtuous Cycle
doubling of
transistors
Software
Devices
Innovation
2x more capable, efficient,
cheaper, smaller, …
Software Complexity
Sequential Interface
Hardware Complexity
Sequential Interface
5
Breaks in Virtuous Cycle
doubling of
transistors
end of
Dennard
Scaling
Devices
Software
Innovation
2x more capable, efficient,
cheaper, smaller, …
Software Complexity
Sequential Interface
Hardware Complexity
Sequential Interface
Sequential Interface
6
Next Steps
Innovate within and across layers
• Algorithms
• Programming “systems”
• Architecture
• Technology
• Education
7
Community
No news here? But…
Are we all acting on this knowledge or are we acting
business as usual?
Are we thinking beyond next paper to where to create
future value?
Denial  …  Acceptance  Act?
2. Advancing Computer Architecture
Research (ACAR)
• Two workshops sponsored by CCC
o 25 + 19 attendees
•
•
•
•
•
Organizers: J. Torrellas (U Illinois) & M. Oskin (U Wash.)
Issued a community-wide call for white papers
Selection committee picked most relevant papers
Included industry folks
Also invited DARPA, DOE, NSF program managers
http://www.cra.org/ccc/docs/ACAR_Report_Popular-Parallel-Programming.pdf
http://www.cra.org/ccc/docs/ACAR2-Report.pdf
What We Found
Data centers and extreme
scale computing
Energy and power
consumption are
the key limiters
Architectures for
programmability
Performance scaling:
• Past: no SW changes
• Now: extensive
SW+HW changes
Specialized architectures
and heterogeneity
Ultimate goal: fully
automated
generation of
app-specific HW
for programs
What We Found
End of road for
conventional ISA
Modern systems
are skyscrapers
built on the ISA
of a bungalow
Secure, reliable and
predictable from the
HW up
Foundation of
computing is
breaking apart;
malicious parties
are exploiting it
Exploiting emerging
technologies
Architecture
research enables
new technologies
to enter the
market quickly
Discussion Points
• Many directions of research are relevant:
o Computer systems research is broadening
• Focus on increasing funding pie, not re-distributing it
• Need to create coalitions with other communities:
o
o
o
o
Big data
New computing materials and devices
Healthcare
…
• Need to move away from incrementalism
System Capability (log)
Advancing Computer Systems
without Technology Progress
Fallow Period
80s
90s
00s
10s
20s
30s
40s
50s
Seek ~1000x = two decades of Moore Law via four thrusts
The views expressed are those of the author and do not reflect the official policy or
position of the Department of Defense or the U.S. Government.
Approved for Public Release, Distribution Unlimited
13
A. Spectrum of Hardware Specialization
Metric
Ops/mm2
Ops/Watt
Time to Soln
NRE
1
1
1
1
(domain specific)
1.5
3-5
Progr.
Accelerator
3
5-10
Fixed
Accelerator
5-10
10
10
(SoC design)
3-5
10
10
10
(SoC design)
10
Normalized to
General-Purpose
Specialized ISA
(domain specific)
(app specific)
Specialized Mem
& Interconnect
(monolithic die)
Package level
integration
(multi die: logic,mem,analog)
(programming GPP)
2-3
(designing &
programming)
2-3
(designing &
programming)
5
10+
10+
(silicon
interposer)
Approved for Public Release, Distribution Unlimited
1.5
2-3
5
C. Reduce Software Bloat
(e.g., matrix multiply)
PHP
9,298,440 ms
51,090x
Python
6,145,070 ms
33,764x
348,749 ms
1816x
C
19,564 ms
107x
Tiled C
12,887 ms
71x
6,607 ms
36x
182 ms
1
Java
Vectorized
BLAS Parallel
• Can we achieve PHP productivity at BLAS efficiency?
Approved for Public Release, Distribution Unlimited
D. Locality-aware Parallelism
• Now: Seek (vast) parallelism
o e.g., simple, energy efficient cores
• But remote communication >100x cost of compute
= 1200 pJ (24x)
16
Approved for Public Release, Distribution Unlimited
C. Approximate Computing Example
SECOND ORDER DIFFERENTIAL EQUATION ON ANALOG ACCELERATOR WITH DIGITAL ACCELERATOR.
Approved for Public Release, Distribution Unlimited
Workshop Takeaway
• Can Harvest in the “Fallow” Period!
A. HW/SW Specialization/Co-design
B. Reduce SW Bloat
C. Approximate Computing
--------------------------------------------------~1000x = 2 decades of Moore’s Law!
• D. Systems must exploit LOCALITY-AWARE parallelism
• HILL’s TWO CENTS: Move beyond General-Purpose
o Systems that do new things, e.g., Kinect
o Optimizations that help some, e.g., big memory workloads
18
Approved for Public Release, Distribution Unlimited
21st Century Computer Architecture
A Community White Paper, April-May 2012
Mark D. Hill, U Wisconsin (coordinator)
Sarita Adve, U Illinois
David H. Albonesi, Cornell U
David Brooks, Harvard U
Luis Ceze, U Washington
Sandhya Dwarkadas, U Rochester
Joel Emer, Intel/MIT
Babak Falsafi, EPFL
Antonio Gonzalez, Intel/UPC
Mary Jane Irwin, Penn State U
David Kaeli, Northeastern U
Stephen W. Keckler, NVIDIA/U Texas
Christos Kozyrakis, Stanford U
Alvin Lebeck, Duke U
Milo Martin, U Pennsylvania
José F. Martínez, Cornell U
Margaret Martonosi, Princeton U
Kunle Olukotun, Stanford U
Mark Oskin, U Washington
Li-Shiuan Peh, M.I.T.
Milos Prvulovic, Georgia Tech
Steven K. Reinhardt, AMD
Michael Schulte, AMD/U Wisconsin
Simha Sethumadhavan, Columbia U
Guri Sohi, U Wisconsin
Daniel Sorin, Duke U
Josep Torrellas, U Illinois
Thomas F. Wenisch, U Michigan
David Wood, U Wisconsin
Katherine Yelick, UC Berkeley/LBNL
+ Jim Larus & Jeannette Wing gave feedback
+ CCC, Erwin Gianchandani, Ed Lazowska guided process
19
Technology’s Challenges
Late 20th Century
Moore’s Law —
2× transistors/chip
The New Reality
Transistor count still 2× BUT…
Dennard Scaling —~constant Gone. Can’t repeatedly double
power/chip
power/chip
Modest (hidden) transistor
unreliability
Increasing transistor unreliability can’t
be hidden
Focus on computation over
communication
Communication (energy) more
expensive than computation
1-time costs amortized via
mass market
One-time cost much worse &
want specialized platforms
How should architects step up as technology falters?
21st Century Computer Architecture
20th Century
Single-chip in
stand-alone
computer
21st Century
Architecture as Infrastructure:
Spanning sensors to clouds
X
Performance plus security, privacy,
availability, programmability, …
Performance via Energy First
invisible
●
Parallelism
X
instruction
●
Specialization
level parallelism ● Cross-layer design
CrossCutting:
Break
current
layers with
new
Predictable
New technologies (non-volatile memory, interfaces
technologies:
near-threshold, 3D, photonics, …)
CMOS, DRAM, & Rethink: memory & storage, reliability,
disks
communication
21
Some Thoughts
Architecture
???
ASPLOS 2014
???
ASPLOS
PL
OS
Need to step up for agency positions
NSF CCF Division Director Search
5. Emmett Witchel Unbounded
THE 90S
SUCKED
JERRY GARCIA
DEAD
1995
THE VERVE
THE VERVE PIPE
ARCHITECTURE
WAS
BORING
MICROARCHITECTURE
PROVIDES PERFORMANCE
Architecture
Intel
DEC Alpha
Date
µArch
Clock
Int95
Date
µArch
Clock
Int95
05/96
Pentium
133
04.2
03/96
21064
266
04.3
10/97
Pentium II 266
10.8
04/97
21164
500
14.4
09/98
Pentium II 450
17.3
09/98
21164
533
16.8
Microarchitecture
or
Clock rate
1. Buy machine
2. Wait 18 months
3. Buy next one
LIFE IS BETTER
NOW
ARCHITECTURE CHANGES
PROVIDE VALUE
Date µArch
01/10 Westmere
01/11
09/11
Intel
Arch
AES-NI
Sandy Bridge
Ivy Bridge
Instruction for SHA-1
RdRand
• VT-x (11/05)
• Extended Page Tables (11/08)
• VT-d (11/08)
• VPID (11/08) (tagged TLB!)
1. Consider app
2. Buy machine
3. Goto 1
HARDWARE + SOFTWARE
COOPERATION NECESSARY
Security
The
‘10s
Mobile
belong
to
Data centers
ASPLOS
Concurrency
GPU/Accelerator
st
21
Research Directions for
Century Computer Systems
ASPLOS 2013 Panel
0. Mark Hill: Introduction
1. Kathryn McKinley on NAS Report
The Future of Computing Performance: Game Over or Next Level?
2. Josep Torrellas on CCC Workshops
Advancing Computer Architecture Research (ACAR)
3. Mark Hill on ISAT Workshop
Advancing Computer Systems without Technology Progress
4. Sarita Adve on CCC White Paper
21st Century Computer Architecture
5. Emmett Witchel unbounded
Kathryn S. McKinley
Kathryn S. McKinley is a Principal Researcher at
Microsoft and an Endowed Professor of Computer
Science at The University of Texas at Austin. She and
her collaborators have produced widely used tools:
the DaCapo Java Benchmarks, TRIPS Compiler, Hoard
memory manager, MMTk garbage collector toolkit,
and Immix garbage collector. Her awards include:
NSF Career, ASPLOS 2009 Best Paper, 2012 IEEE Top
Picks, CACM Research Highlights (2006, 2012), Most
Influential OOPSLA Paper from 2002 (awarded 2012),
the 2011 ACM SIGPLAN Distinguished Service Award,
and the 2012 ACM SIGPLAN Programming Languages
Software Award. She has graduated 17 PhD students.
She is an IEEE Fellow and ACM Fellow.
33
Josep Torrellas
Josep Torrellas is a Professor of Computer Science at the
University of Illinois Urbana-Champaign. He is the Director of
the Center for Programmable Extreme Scale Computing,
and the Director of the Illinois-Intel Parallelism Center
(I2PC). He has also been a Willett Faculty Scholar and lead
the OpenSPARC Center of Excellence. He is the past Chair
of the IEEE Technical Committee on Computer
Architecture, and currently serves as a Council Member of
CRA's Computing Community Consortium. He is a Fellow of
IEEE and ACM. He has made many technical contributions
in the areas of shared-memory parallel computer
architecture, low-power design, hardware reliability, and
software dependability. He has graduated 30 Ph.D.
students, who are now leaders in academia and industry.
He is currently working on the Bulk Multicore Architecture,
and on the DARPA-funded Runnemede Extreme Scale
Architecture, both in collaboration with Intel.
34
Mark Hill
Mark D. Hill (www.cs.wisc.edu/~markhill) is professor in
both the computer sciences department and the
electrical and computer engineering department at
the University of Wisconsin--Madison, where he also
co-leads the Wisconsin Multifacet
(www.cs.wisc.edu/multifacet/) project with David
Wood. His research interests include parallel computer
system design, memory system design, computer
simulation, deterministic replay and transactional
memory. He earned a PhD from University of
California, Berkeley. He is an ACM Fellow and a Fellow
of the IEEE.
35
Sarita Adve
Sarita Adve is Professor of Computer Science at the
University of Illinois at Urbana-Champaign. Her
research interests are in computer architecture and
systems, parallel computing, and power and reliabilityaware systems. Her honors include the Anita Borg
Institute Women of Vision award in innovation, the
ACM SIGARCH Maurice Wilkes award, the University
Scholar recognition by the University of Illinois, and an
Alfred P. Sloan Research Fellowship. She is a fellow of
the ACM and the IEEE. She serves on the boards of the
Computing Research Association and ACM SIGARCH.
She received the Ph.D. in Computer Science from the
University of Wisconsin-Madison in 1993.
36
Emmitt Witchel
Emmett Witchel is an associate professor in computer
science at The University of Texas at Austin. He and his
group are interested in operating systems, security,
and architecture. Most of his current research is
about secure systems, GPU systems, and concurrent
systems. He received his doctorate from MIT in 2004.
37

similar documents