FFC Standard Template - JAN08

Report
DoD HPC Modernization Program &
Move Toward Emerging Architectures
Tom Dunn
Naval Meteorology & Oceanography Command
20 November 2014
 HPC RECENT TRENDS Per Top500 List
 RECENT 2014 DOD ACQUISITIONS
 EXPECTED PROCESSOR COMPETITION
 ONWARD TOWARD EXASCALE
2
Navy DoD SUPERCOMPUTING RESOURCE
CENTER
Peak Computational Performance (Teraflops)
Estimates Follow Moore’s Law (~2x every 2 yrs)
1997 –
.3 TFs
2001 – 8.4 TFs
2004 32 TFs
2006 – 58 TFs
2008 – 226 TFs
2012
2014
2015
2017
(Dec)
(Jul)
(Jul)
(Jul)
– 954 TFs
– 2,556 TFs
– 5,760 TFs est.
–10,000 TFs est.
3
Navy DSRC Capabilities
• One of the most capable HPC centers in the DoD and the nation
• Chartered as a DoD Supercomputing Center in 1994
• Computational performance approximately doubles every two
years; Currently 2,556 Teraflops
• Systems reside on the Defense Research and Engineering
Network (DREN) with 10 Gb connectivity – 19 Dec 2013
• 15% of Navy DSRC’s computational and storage capacity
reserved for CNMOC activities operational use
• R&D and CNMOC Ops are placed in separate system partitions
and queues
4
Top500® Systems by Architecture, June 2006–June 2014
5
Number of CPUs in the Top500® Systems by Architecture Type,
June 2006–June 2014
6
Number of Systems in the Top500® Utilizing Co-Processors or
Accelerators, June 2009–June 2014
7
Number of Systems in the Top500® by Co-Processors or
Accelerators Type, June 2009–June 2014
8
Number of Cores in the Top500® by Co-Processors or
Accelerators Type, June 2011–June 2014
9
Number of Cores in the June 2014 Top500®
by CPU Manufacturer
JUN 2014
10
TOP 500 SUPERCOMPUTER LIST (JUNE 2014)
BY OEM Supplier
TOP 500
CRAY INC
DELL
51
8
HEWLETT PACKARD
182
IBM
176
SGI
19
TOTAL
436
Other Suppliers
64
11
High Performance Computing Modernization
Program 2014 HPC Awards
Feb. 2014
 Air Force Research Lab (AFRL) DSRC, Dayton, OH
Cray XC-30 System (Lightning)
- 1281 teraFLOPS
- 56,880 Compute Cores (2.7 GHz Intel Ivy Bridge)
- 32 NVIDIA Tesla K40 GPGPUs
 Navy DSRC, Stennis Space Center, MS
Cray XC-30 (Shepard)
- 813 teraFLOPS
- 28,392 Compute Cores (2.7 GHz Intel Ivy Bridge)
- 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60
core Intel Xeon 5120D Phi
- 32 NVIDIA Tesla K40 GPGPUs
Cray XC-30 (Armstrong)
- 786 teraFLOPS
- 29,160 Compute cores (2.7 GHz Intel Ivy Bridge)
- 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60
core Intel Xeon 5120D Phi
12
High Performance Computing Modernization
Program 2014 HPC Awards
September 2014
 Army Research Lab (ARL) DSRC, Aberdeen, MD
Cray XC-40 System
- 3.77 petaFLOPS
- 101,312 compute cores (2.3 GHz Intel Xeon Haswell)
- 32 NVIDIA Tesla K40 GPGPUs
- 411 TB memory
- 4.6 PB storage
 Army Engineer Research Development Center (ERDC) DSRC, Vicksburg, MS
SGI ICE X System
- 4.66 petaFLOPS
- 125,440 compute cores (2.3 GHz Intel Xeon Haswell)
- 32 NVIDIA Tesla K40 GPGPUs
- 440 TB memory
- 12.4 PB storage
13
High Performance Computing Modernization
Program 2014/2015 HPC Awards
Air Force Research Lab (AFRL) DSRC, Dayton, OH
 FY15 Funded
OEM and Contract Award - TBD
- 100,000+ compute cores
- 3.5 – 5.0 petaFLOPS
Navy DSRC, Stennis Space Center, MS
 FY15 Funded
OEM and Contract Award - TBD
- 100,000+ compute cores
- 3.5 – 5.0 petaFLOPS
14
ECMWF (Top 500 List Jun 2014)
2 Cray XC30 Systems
each with 81,160 compute cores (2.7 GHz Intel Ivy Bridge)
1,796 teraFLOPS
NOAA NWS/NCEP
Weather & Climate Operational Supercomputing System (WCOSS)
Phase I
2 IBM iDataplex systems
each with 10,048 compute cores (2.6 GHz Intel Sandy Bridge)
213 teraFLOPS
Phase II
(Jan 2015) Addition
2 IBM NeXtScale systems
each with 24,192 compute cores (2.7GHz Intel Ivy Bridge)
585 teraFLOPS
15
UK Meterological Office
IBM Power 7 System
18,432 compute cores (3.836 GHz)
565 teraFLOPS
IBM Power 7 System
15,360 compute cores (3.876 GHz)
471 teraFLOPS
---------------------------------------------------------------------------------------------------27 Oct 2014 Announcement
128M Contract
2 Cray XC-40 systems (Intel Xeon Haswell initially)
>13 times faster than current system
total of 480,000 compute cores
Phase 1a
replace Power 7s by Sep 2015
Phase 1b
extend both systems to power limit by Mar 2016
Phase 1c
add one new system by Mar 2017
16
Expected Near Term HPC Processor
Options
2016
Intel and ARM
- Cray has ARM in-house for testing
2017
- Intel, ARM, & IBM Power 9 (with closely
coupled NVIDIA GPUs)
17
DoD Applications & Exascale Computing
• General external impression
– In the 2024 timeframe, DoD will have no requirement for a
balanced exascale supercomputer (untrue)
– DoD should not be a significant participant in exascale
planning for the U.S. (untrue)
• Reality
– DoD has compelling coupled multi-physics problems which
will require more tightly-integrated resources than
technologically possible in the 2024 timeframe
– DoD has many other use cases which will benefit from the
power efficiencies and novel technologies generated by the
advent of exascale computing
18
HPCMP & 2024 DoD Killer Applications
• HPCMP Categorizes Users Base into 11 Computational Technology Areas
(CTAs)
• Climate Weather Ocean (CWO) is one of 11 CTAs
• Dr. Burnett (CNMOC TD) is the DoD HPCMP CWO CTA leader
• Each CTA leader tasked in FY14 to project Killer Apps in their CTA
• Dr. Burnett’s CWO CTA analysis lead by Lockheed Martin
• Primary focus is on HYCOM but includes NAVGEM, and ESPC
• Expect follow-on FY15 funding
• Develop appropriate Kiviat diagrams (example to follow)
• NRL Stennis part of an ONR sponsored NOPP project starting FY14 to look
at attached processors (i.e. GPGPUs and accelerators) for
HYCOM+CICE+WW3
19
Relevant Technology Issues
• Classical computing advances may stall in the next 10 years
– 22nm (feature size for latest processors)
– 14nm (anticipated feature size in 2015)
– 5-7nm (forecast limit for classical methods)
– Recent 3D approaches currently used and dense 3D approaches
contemplated, but have limitations
• Mean-time-between-failures (MTBF) will decrease dramatically
– Petascale (hours to days)
– Exascale (minutes)
• Data management exacale hurdles
• Power management exascale hurdles
20
Relevant Software Issues
• Gap between intuitive coding (i.e. readily relatable to domain
science) and high performance coding will increase
• Underpinnings of architectures will change more rapidly than
codes can be refactored
• Parallelism of underlying mathematics will become asymptotic
(at some point) despite the need to scale to millions [if not
billions] of processing cores
• Current parallel code is based (in general) on synchronous
communications; however, asynchronous methods may be
necessary to overcome technology issues
21
Path Forward (Deliverables) [cont.]
•
Kiviat diagram conveying system architecture requirements for
each impactful advent
Future Computational Requirements
for Hypersonic Flight Simulation
PetaFLOPs
10,000
I/O bandwidth
(terabytes/s)
1,000
100
Job duration
(weeks)
10
Spirit: 1.5PF Reference System
1
Disk capacity
(petabytes)
0
Memory capacity
(petabytes)
X-51: 1 Minute Flight Sim
SR-72: 1 Minute Flight Sim
Exascale Reference
1/(interconnect
latency)
(1/microseconds)
Memory BW
(petabytes/s)
Interconnect BW
(petabits/s)
22
March Toward Exascale Computing
• Dept of Energy target for exascale in 2024
• Japan target for exascale in 2020 (with $1B gov assistance)
• China target for exascale now in 2020 (originally in 2018)
• HPCMP’s systems expected in 7 or 8 years – 100 petaflops
23

similar documents