M. Ashworth - CHPC National Meeting

Report
Computational Science at STFC
and the Exploitation of Novel
HPC Architectures
Mike Ashworth
Scientific Computing Department
and
STFC Hartree Centre
STFC Daresbury Laboratory
[email protected]
• STFC’s Scientific Computing Department
• STFC’s Hartree Centre
• Exploitation of Novel HPC Architectures
• STFC’s Scientific Computing Department
• STFC’s Hartree Centre
• Exploitation of Novel HPC Architectures
Organisation
HM Government (& HM Treasury)
RCUK Executive Group
UK Astronomy Technology
Centre, Edinburgh, Scotland
Polaris House Swindon,
Wiltshire
Daresbury Laboratory
Daresbury Science and Innovation Campus
Warrington, Cheshire
Rutherford Appleton Laboratory
Harwell Science and Innovation Campus
Didcot, Oxfordshire
Chilbolton Observatory
Stockbridge, Hampshire
STFC’s Sites
Isaac Newton Group of Telescopes
La Palma
Joint Astronomy
Centre Hawaii
Understanding our Universe
STFC’s Science Programme
Particle Physics
Large Hadron Collider (LHC), CERN - the structure and forces
of nature
Ground based Astronomy
European Southern Observatory (ESO), Chile
Very Large Telescope (VLT), Atacama Large Millimeter Array
(ALMA), European Extremely Large Telescope (E-ELT),
Square Kilometre Array (SKA)
Space based Astronomy
European Space Agency (ESA)
Herschel/Planck/GAIA/James Webb Space Telescope (JWST)
Bi-laterals – NASA, JAXA, etc.
STFC Space Science Technology Department
Nuclear Physics
Facility for anti-proton and Ion research (FAIR), Germany
Nuclear Skills for - medicine (Isotopes and Radiation
applications), energy (Nuclear Power Plants) and
environment (Nuclear Waste Disposal)
STFC’s Facilities
Neutron Sources
ISIS - pulsed neutron and muon source/ and
Institute Laue-Langevin (ILL), Grenoble
Providing powerful insights into key areas of
energy, biomedical research, climate,
environment and security.
High Power Lasers
Central Laser Facility - providing applications on bioscience
and nanotechnology
HiPER
Demonstrating laser driven fusion as a future source of
sustainable, clean energy
Light Sources
Diamond Light Source Limited (86%) - providing new
breakthroughs in medicine, environmental and materials
science, engineering, electronics and cultural heritage
European Synchrotron Radiation Facility (ESRF), Grenoble
Scientific Computing
Department
Major funded activities
•
•
•
•
•
•
190 staff supporting over 7500 users
Applications development and support
Director: Adrian Wander
Compute and data facilities and services
Research: over 100 publications per annum Appointed 24th July 2012
Deliver over 3500 training days per annum
Systems administration, data services, high-performance
computing, numerical analysis & software engineering.
Major science themes and capabilities
• Expertise across the length and time scales from processes
occurring inside atoms to environmental modelling
The UK National
Supercomputing facilities
The UK National Supercomputing
Services are managed by EPSRC
on behalf of the UK academic
communities
HPCx ran from 2002-2009 using
IBM POWER4 and POWER5
HECToR current service 2007-2014
Located at Edinburgh, operated jointly by STFC and EPCC
HECToR Phase3 90,112 cores Cray XE6 (660 Tflop/s Linpack)
ARCHER is the new service, early access now, service starts
16th Dec‘13, operated by STFC and EPCC (1.37 Pflop/s
Linpack #19 TOP500)
PRACE
 PRACE launched 9th June 2010
 25 member countries; seat in Belgium
 France, Germany, Italy and Spain have
each committed €100M over 5 years
 EC funding of €70M for infrastructure
 Tier-0 infrastructure providing free-of-charge service for
European scientific communities based on peer review
 Four projects to date: PP, 1IP, 2IP, 3IP overlapping
 1IP finished; 2IP extended; 3IP about half way through
 EPSRC represent UK on PRACE Council; STFC and EPCC carry out
technical work in the PRACE projects
 STFC focus is on application optimization & benchmarking,
technology evaluation, procurement procedures, training
 STFC contributes 2.5% BlueGene/Q into the PRACE DECI calls
10
Scientific Highlights
Journal of Materials Chemistry 16 no. 20 (May 2006) - issue
devoted to HPC in materials chemistry (esp. use of HPCx);
Phys. Stat. Sol.(b) 243 no. 11 (Sept 2006) - issue featuring
scientific highlights of the Psi-k Network (the European
network on the electronic structure of condensed matter
coordinated by our Band Theory Group);
Molecular Simulation 32 no. 12-13 (Oct, Nov 2006) - special issue
on applications of the DL_POLY MD program written &
developed by Bill Smith (the 2nd special edition of Mol Sim on
DL_POLY - the 1st was about 5 years ago);
Acta Crystallographica Section D 63 part 1 (Jan 2007) proceedings of the CCP4 Study Weekend on protein
crystallography.
The Aeronautical Journal, Volume 111, Number 1117 (March
2007), UK Applied Aerodynamics Consortium, Special Edition.
Proc Roy Soc A Volume 467, Number 2131 (July 2011), HPC in the
Chemistry and Physics of Materials.
Last 5 years metrics:
–
–
–
–
–
–
67 grants of order £13M
422 refereed papers and 275 presentations
Three senior staff have joint appointments with
Universities
Seven staff have visiting professorships
Six members of staff awarded Senior
Fellowships or Fellowships by Research
Councils’ individual merit scheme
Five staff are Fellows of senior learned societies
• STFC’s Scientific Computing Department
• STFC’s Hartree Centre
• Exploitation of Novel HPC Architectures
Opportunities
Political
Opportunity
• Demonstrate growth
through economic and
societal impact from
investments in HPC
Business
Opportunity
• Engage industry in HPC
simulation for competitive
advantage
• Exploit multi-core
Scientific
Opportunity
• Build multi-scale, multiphysics coupled apps
• Tackle complex Grand
Challenge problems
Technical
Opportunity
• Exploit new Petascale and
Exascale architectures
• Adapt to multi-core and
hybrid architectures
Tildesley Report
BIS commissioned a report on the
strategic vision for a UK e-Infrastructure
for Science and Business.
Prof Dominic Tildesley led the team
including representatives from
Universities, Research Councils, industry
and JANET. The scope included
compute, software, data, networks,
training and security.
Mike Ashworth, Richard Blake and John
Bancroft from STFC provided input.
Published in December 2011. Google the title to download from the BIS website
Government Investment
in e-infrastructure - 2011
17th Aug 2011: Prime Minister David Cameron
confirmed £10M investment into STFC's
Daresbury Laboratory. £7.5M for computing
infrastructure
3rd Oct 2011: Chancellor George Osborne
announced £145M for e-infrastructure at the
Conservative Party Conference
4th Oct 2011: Science Minister David Willetts
indicated £30M investment in Hartree Centre
30th Mar 2012: John Womersley CEO STFC
and Simon Pendlebury IBM signed major
collaboration at the Hartree Centre
Clockwise from top left
Intel collaboration
STFC and Intel have signed an MOU to develop and test
technology that will be required to power the
supercomputers of tomorrow.
STFC and Intel have signed an MOU to develop and test
technology that will be required to power the
supercomputers of tomorrow.
Karl Solchenbach, Director of
European Exascale Computing
at Intel said "We will use STFC's
leading expertise in scalable
applications to address the
challenges of exascale
computing in a co-design
approach."
Collaboration with
Unilever
1st Feb 2013: Also announced was a key partnership with Unilever in the
development of Computer Aided Formulation (CAF)
Months of laboratory bench work can be completed within minutes by a tool
designed to run as an ‘App’ on a tablet or laptop which is connected remotely
to the Blue Joule supercomputer at Daresbury.
The aggregation of surfactant
molecules into micelles is an
important process in product
formulation
This tool predicts
the behaviour and
structure of
different
concentrations of
liquid compounds,
both in the bottle
and in-use, and
helps researchers
plan fewer and
more focussed
experiments.
John Womersley, CEO STFC, and Jim
Crilly, Senior Vice President,
Strategic Science Group at Unilever
Hartree Centre
IBM BG/Q Blue Joule
TOP500
(#13 in Jun 2012 list)
#23
in the Nov 2013 list
#8
in Europe
#2
system in UK
6 racks
• 98,304 cores
• 6144 nodes
• 16 cores & 16 GB
per node
• 1.25 Pflop/s peak
1 rack to be configured as BGAS
(Blue Gene Advanced Storage)
• 16,384 cores
• Up to 1PB Flash memory
Hartree Centre
IBM iDataPlex Blue Wonder
TOP500
(#114 in Jun 2012 list)
#283 in the Nov 2013 list
8192 cores, 170 Tflop/s peak
node has 16 cores, 2 sockets
Intel Sandy Bridge (AVX etc.)
252 nodes with 32 GB
4 nodes with 256 GB
12 nodes with X3090 GPUs
256 nodes with 128 GB
ScaleMP virtualization software up
to 4TB virtual shared memory
Hartree Centre
Datastore
Storage:
5.76 PB usable disk storage
15 PB tape store
Hartree Centre
Visualization
Four major facilities:
Hartree Vis-1: a large visualization “wall” supporting stereo
Hartree Vis-2: a large surround and immersive visualization system
Hartree ISIC:
a large visualization “wall” supporting stereo at ISIC
Hartree Atlas: a large visualization “wall” supporting stereo in the Atlas
Building at RAL, part of the Harwell Imaging Partnership (HIP)
Virtalis is the hardware supplier
Home of the
2nd most powerful
supercomputer
in the UK
Douglas Rayner Hartree
Father of Computational Science
• Hartree–Fock method
• Appleton–Hartree equation
• Differential Analyser
• Numerical Analysis
Douglas Rayner Hartree
PhD, FRS (1897 –1958)
“It may well be that the
high-speed digital
computer will have as
great an influence on
civilization as the advent
of nuclear power” 1946
Douglas Hartree with Phyllis Nicolson at the Hartree
Differential Analyser at Manchester University
Hartree Centre
Official Opening
1st Feb 2013: Chancellor George Osborne and Science Minister David
Willetts opened the Hartree Centre and announced a further £185M of
funding for e-Infrastructure
£19M for the Hartree Centre for power-efficient computing technologies
£11M for the UK’s participation in the Square Kilometre Array
This investment forms part of the £600
million investment for science
announced by the Chancellor at the
Autumn Statement 2012.
“By putting out money into science we
are supporting the economy of
tomorrow.”
George Osborne opens the Hartree
Centre, 1st February 2013
Work with us to:
• Sharpen your innovation
• Improve your global competitiveness
• Reduce research and development costs
• Reduce costs for certification
• Speed up your time to market
Power-efficient
Technologies ‘Shopping List’
£19M investment in power-efficient technologies
• System with latest NVIDIA Kepler GPUs
• System based on Intel Xeon Phi
• System based on ARM processors
• Active storage project using IBM BGAS
• Dataflow architecture based on FPGAs
• Instrumented machine room
Systems will be made available
for development and evaluation
projects with Hartree Centre
partners from industry,
government and academia
• STFC’s Scientific Computing Department
• STFC’s Hartree Centre
• Exploitation of Novel HPC Architectures
Accelerators
in the TOP500
Where are
the accelerators?
TOP500 1-10 accelerator based systems:
#1
#2
#6
#7
Tianhe-2
Titan
Piz Daint
Stampede
33.8 Pflop/s Intel Xeon Phi
17.6
Nvidia K20x
6.27
Nvidia K20x
5.17
Intel Xeon Phi
3120k (2736k) cores
560k (262k)
116k (74k)
462k (366k)
UK National resources:
STFC Hartree Centre IBM iDataPlex has 48 Nvidia Fermi GPUs
UK Regional resources:
e-Infrastructure South: EMERALD consists of 372 Fermi GPUs
Local resources:
University and departmental clusters
Under your desk?
The Power Wall
•
•
•
•
Transistor density is still
increasing
Clock frequency is not
due to power density
constraints
Cores per chip is
increasing, multi-core
CPUs (currently 8-16) and
GPUs (~500)
Little further scope for
instruction level
parallelism
Source: Intel, Microsoft (Sutter)
and Stanford (Olukotun,
Hammond)
Processor Comparison
Processor
Parallelism
GB/s
Tflop/s
Gflop/s/W
Nvidia K40
15x64
288
1.43
6.1
2x56x32*
480
1.48
3.9
60 x 8
320
1.00
4.5
16x4
50
0.20
1.3
AMD FirePro S10000
Intel Xeon Phi
Intel E5-2667W
* single precision cores. double precision is 1/4.
Software Challenges
Fundamental challenges is extracting
additional parallelism in your application
Fortran/C + MPI
OpenMP for multi-core
CUDA for Nvidia GPUs
OpenCL
OpenACC
Directive-based from Cray (& PGI)
OpenMP 4 for accelerators
• Gung-Ho – a new atmospheric
dynamical core
• NEMO on GPUs
• DL_POLY on GPUs
• LBM on GPUs
Current
Unified Model
450
Met Office Unified Model
‘Unified’ in the sense of using
the same code for weather
forecasting and for climate
research
400
350
300
250
“New Dynamics” Davies et al (2005)
“ENDGame” to be operational in 2013
From Nigel Wood, Met Office
150
2011
20
11
01
20
10
01
20
09
01
20
08
01
20
07
01
20
06
01
2003
20
05
01
100
20
04
01
Also couples to other models
(ocean , sea-ice, land
surface, chemistry/aerosols
etc.) for improved forecasting
and earth system modelling
Met Office
ECMWF
USA
France
Germany
Japan
Canada
Australia
200
20
03
01
Combines dynamics on a
lat/long grid with physics
(radiation, clouds,
precipitation, convection etc.)
Performance of the UM (dark blue) versus
a basket of models measured by 3-day
surface pressure errors
Limits to
Scalability of the UM
The current version (New Dynamics)
has limited scalability
The problem lies with the spacing of
the lat/long grid at the poles
The latest ENDGame code improves
this, but a more radical solution is
required for Petascale and beyond
At 25km resolution, grid spacing near
poles is 75m
(17km)
Perfect scaling
POWER7 Nodes
From Nigel Wood, Met Office
At 10km this reduces to 12m!
Challenging Solutions
GUNG-HO targets a brand new dynamical core
Scalability – choose a globally uniform grid which has
no poles (see below)
Speed – maintain performance at high & low
resolution and for high & low core counts
Accuracy – need to maintain standing of the model
Space weather implies a 600km deep model
Five year project 2011-2015
Operational weather forecasts around 2020!
Triangles
From Nigel Wood, Met Office
Cubesphere
Yin-Yang
Globally
Uniform
Next
Generation
Highly
Optimized
“Working together
harmoniously”
Design considerations
Ford et al, “Gung Ho: A code design for weather and climate
prediction on exascale machines”, EASC 2013, to appear in a special
edition of the Advances in Engineering Software
• Fortran 2003, MPI, OpenMP and OpenAcc
• Other models e.g. PGAS, CAF, are not excluded
• Indirect addressing in the horizontal to support a wide
range of possible grids
• Direct addressing in the vertical
• Vertical index innermost is optimal for cache re-use in
CPUs, and can also achieve coalesced memory access
in GPUs
Software architecture
The Gung-Ho software architecture is structured
into layers communicating via a defined API
• the driver layer (control for one or more models)
• the algorithm layer (high-level specification)
• the parallelisation system (PSy) (inter-node and intranode parallelism, parsing, transformations, hardware-specific code
generation)
• the kernel layer
(toolkit of algorithm building blocks with
directives)
• the infrastructure layer
(generic library to support
parallelisation, communications, coupling, I/O etc.)
Gung-Ho Single
Model architecture
The arrows
represent the
APIs connecting the layers
The direction
shows the
flow control
A code
generator will
parse the
algorithm
layer source
code.
Structure
Alg
Code
Algorithm
Generator
Alg
Code
PSy
Generator
PSy
Code
Parser
Kernel
Codes
Generator
...
ast,invokeInfo=parse(filename,invoke_name=invokeName)
alg=algGen(ast,invokeInfo,psyName=psyName,
invokeName=invokeName)
psy=psyGen(invokeInfo,psyName=psyName)
...
Invoking the generator
[email protected]:~/proj/GungHoSVN/LFRIC/src/generator$ python generator.py
usage: generator.py [-h] [-oalg OALG] [-opsy OPSY] filename
generator.py: error: too few arguments
integrate_one_generate:
python ../generator/generator.py -oalg integrate_one_alg.F90 -opsy
integrate_one_psy.F90 integrate_one.F90
make integrate_one_generated
Example (integrate_one)
program main
...
use integrate_one_module, only : integrate_one_kernel
...
call invoke(integrate_one_kernel(x, integral))
...
end program main
PROGRAM main
...
USE psy, ONLY: invoke_integrate_one_kernel
...
CALL invoke_integrate_one_kernel(x, integral)
…
END PROGRAM main
Example (integrate_one)
module integrate_one_module
use kernel_mod
implicit none
private
public integrate_one_kernel
public integrate_one_code
type, extends(kernel_type) :: integrate_one_kernel
type(arg) :: meta_args(2) = (/&
arg(READ, (CG(1)*CG(1))**3, FE), &
arg(SUM, R, FE)/)
integer :: ITERATES_OVER = CELLS
contains
procedure, nopass :: code => integrate_one_code
end type integrate_one_kernel
contains
subroutine integrate_one_code(layers, p1dofm, X, R)
...
Example (integrate_one)
MODULE psy
USE integrate_one_module, ONLY: integrate_one_code
USE lfric
IMPLICIT NONE
CONTAINS
SUBROUTINE invoke_integrate_one_kernel(x, integral)
...
SELECT TYPE ( x_space=>x%function_space )
TYPE IS ( FunctionSpace_type )
topology => x_space%topology
nlayers = topology%layer_count()
p1dofmap => x_space%dof_map(cells, fe)
END SELECT
DO column=1,topology%entity_counts(cells)
CALL integrate_one_code(nLayers, p1dofmap(:,column), x%data, integral%data(1))
END DO
END SUBROUTINE invoke_integrate_one_kernel
END MODULE psy
Timetable
 Further development and testing of
horizontal [2013]
 Testing of proposals for code
architecture [2013]
 Vertical discretization [2013]
 3D prototype development [2014-2015]
 Operational around 2020…?
NEMO Acceleration
on GPUs using OpenACC
Maxim Milakov,
Peter Messmer,
Thomas Bradley,
NVIDIA
Flat profile, code
converted using
OpenACC directives
GYRE only – we are
looking at more
realistic test cases,
with ice and land
Tesla M2090 GPUs,
Westmere CPUs
Milakov, Messmer, Bradley, GPU Technology Conference,
18th-22nd March 2013
DL_POLY
Acceleration on GPUs
Chritos
Kartsaklis and
Ruairi Nestor,
ICHEC,
Ilian Todorov
and Bill Smith,
STFC
CUDA implementation of key
DL_POLY
features:
Constraints Shake
Link cell pairs
Two-body forces
Ewald SPME forces
DMPC (dimethyl pyrocarbonate) in water, 413896 atoms
(test case 4)
DL_POLY
Acceleration on GPUs
“Benchmarking
and Analysis of
DL_POLY 4 on
GPU Clusters”
Lysaght et al
PRACE report
Significant 8x
speed-up using
cuFFT
MPI code scales
to 108 atoms
on >105 cores
Pure MPI vs. 2 GPUs on the ICHEC Stokes GPU cluster
Sodium Chloride, 216000 Ions (test case 2)
3D LBM on
Kepler GPUs (1)
Mark Mawson &
Alistair Revell,
Manchester
Lattice
Boltzmann
Method (LBM)
for solving fluid
flow
Focus on
memory
transfer issues
SKA
STFC is a partner
in the Science
Data Processor
(SDP) work
package
Stephen Pickles
leads the software
engineering task in
SKA.TEL.SDP.ARCH.S
WE developing the
software systemlevel prototype
We are also contributing to the work to
build, run and support the SDP software
prototypes on existing production HPC
facilities (SKA.TEL.SDP.PROT.ISP).
Work started on 1st November 2013
Summary
• New UK Government investment
supports a wide range of eInfrastructure projects (incl. data
centres, networks, ARCHER, Hartree)
• The Hartree Centre is ‘open for
business’
• We are driving forward application
development on emerging
architectures
For more information see http://www.stfc.ac.uk/scd
If you have been …
… thank you for listening
Mike Ashworth
[email protected]
http://www.stfc.ac.uk/scd

similar documents