Advanced Processor Technologies

Report
Advanced Processor
Technologies
group overview
1
APT mission
“To explore novel architectures and
techniques that will enable the
effective exploitation of the billion
transistor chips of the near-future”
2
APT group
• Focus:
– Moore’s Law will soon deliver billion
transistor chips
– how do we make best use of a
billion transistors?
• parallel processing
• systems-on-chip
• novel architectures
• …?
3
Strategy/Vision
• Industry shift to multicore
processors
– directly addressed by our CMP work
• Power/heat is performance-limiting
– asynchronous and low-power design
have growing importance
• Timing closure is a critical problem
– acceptance of mixed timing and GALS
• Design automation is vital
– async automation must be competitive
4
Strategy/Vision
• Can university groups design state-
of-the-art digital silicon?
– probably not in conventional
processors
– few academic groups still fab digital
chips
• Is trying to take designs through to
fabrication still a good idea?
– we believe so, because ‘reality’
matters!
– but the game is very tough indeed
5
Many-core
Architecture and
Software
Mikel Lujan
6
Buying a single-core
processor is difficult!
Multi-cores bring fundamental
changes for Computer Science
[applications, programming languages, compilers runtime
systems (OS), computer architecture]
7
Active projects
• Managed Runtime Environments
and Low-Power Many-core
Architectures
– DOME Delaying and Overcoming
Microprocessor Errors
• Teraflux
– On the search for a “good” parallel
computational model
• AXLE
– Accelerating Analytics of Big Data
8
Managed Runtime
Enviroments
• Java, .Net are examples of managed
runtime environments (JVM, CLR)
• Key elements: JIT compilation and control
of memory allocation
• Research opportunities:
– Scaling MREs for many-core architectures
(GPUs)
– Hardware acceleration of MREs
– Use MREs for low-power computing
– Use MREs for dealing with faults and transistor
wearout -> DOME
9
TeraFlux Project
• Major focus of current ‘General
Purpose’ Many-Core research.
• Three major goals
– To define the hardware architecture of a
highly extensible, general purpose multicore system
– To develop a simple to use parallel
programming approach based on
programming with
• side-effect-free computations + transactions
– How do we simulate/prototype many-
cores architectures?
10
Starting Assumptions
• Requiring strongly consistent shared
memory is a major impediment to
extensibility
• The efficient scheduling of controlflow based threads is hard
• The major complexity in parallel
programming is the handling of
shared state (locks etc.)
11
Simulate/Prototype
many-core architectures
• Designing a chip is expensive and time consuming
• Computer architects build software models to
simulate new architectures
• Simulation can be slow (months to run one
application)
• How we can accelerate this process? Research
opportunities
– New modelling techniques
– FPGA prototyping
12
AXLE & Big Data
• Collaboration with Dr. Gavin Brown (MLO group)
• Amount of data generated in scientific
experiments or social web keeps growing!
• Graph-based data -> complex computation
• How can we make sense of this data
deluge?
– New Learning techniques capable of working at scale
– Redesign architectures (clusters/data centres) and
software for low power analytics
– Accelerate software (JIT adaptation) for data processing
– Hardware acceleration for low-power learning algorithms
13
For more background info
• "Future Multi-core Computing"
(COMP6062b)
– Learn by directed reading and group
discussions of research papers
– Practice parallel programming in the labs
• Watch out for the organised ARM &
Intel school seminars in Nov and Dec
14
Communication
Architectures
Javier Navaridas
15
Interconnection
Networks
• On-chip networks
– Tile-based systems
– Heterogeneous systems
• High performance computing
networks
– Massively Parallel Processing systems
– Compute Clusters
– Datacentres
16
Topics
• Topologies
– Routing
– Wiring
– Fault resilience
– Deadlock avoidance
• Router microarchitecture
– Congestion control
– Quality of Service
– Fault tolerance
• Scheduling and resource management
– Task placement
• System and workload modelling
– Analytical modelling
– Simulation
17
Virtualization
Alasdair Rawsthorne
18
Unifying System and
Process Virtualization
Application
Application
Application
Application
Operating
System
Operating
System
Dynamic
Runtime
Operating
System
CPU
Hypervisor/VM
M
Operating
System
Optimizing
VMM
CPU
CPU
CPU
System
Virtualization
(eg Xen,
Vmware,
VirtualBox)
Process
Virtualization
(eg JVM,
Rosetta,
DynamoRIO,
ValGrind)
Unified
Virtualization
Unvirtualized
• Potential benefits: performance, power, design time,
security
• Impacts design of future compilers, OS, CPU and runtimes
[email protected]
19
Neural Systems
Engineering
Steve Furber,
Jim Garside,
Dave Lester
20
The SpiNNaker
project
• Multi-core CPU node
– 18 ARM968
processors
– to model large-scale
systems of spiking
neurons
– in biological real
time
• Scalable up to
systems with
10,000s of nodes
– over a million
processors
– >108 MIPS total
21
Current status…
• Full 18-core chip: arrived 20 May 2011
• Test card: 4 chips, 72 processors
– Cards can be linked together
• Neuron models: LIF, Izhikevich, MLP
• Synapse models: STDP, NMDA
• Networks: PyNN -> SpiNNaker, various small
tools to build Router tables, etc
• 48-chip 103 machine
…and the next steps:
• 500-chip 104 machine (Q4 2012), 5,000-chip 105
machine (H1 2013), 50,000-chip 106 machine (H2 2013).
22
PhD projects
• Recent:
– SpiNNaker monitoring
– PyNN -> SpiNNaker
– Real-time neural learning algorithms
– Modelling the rat barrel cortex
– Technology scaling on SpiNNaker
– Error correction with CRC
23
Technology Scaling
•
90nm SpiNNaker CPU node
SP library is faster
•
•
•
requires 128k DTCM
LL library better overall?
(work by Eustace Painkras, UoM PhD)
24
PyNN -> SpiNN
•
LIF
•
Izhikevich
25
PhD projects
• Future:
– System software
• run-time fault-tolerance, scaling, …
– SpiNNaker2 architecture exploration
– Neural network models
• learning algorithms, rewiring
– Robotics using SpiNNaker
– Non-neural algorithms
• graphics, physics modelling, …
26
Emerging Technologies
for Integrated Circuits
and Systems
Let’s do some hard(ware)
work
Vasilis Pavlidis
www.cs.man.ac.uk/~pavlidiv
27
3-D Integration
Opportunities
2-D global wire of 20 mm
3-D global wire of 12 mm
• Integrate disparate
•
The same total area for the
two circuits
• RTSV = 170 mΩ, CTSV = 2 fF
• *RCs for 65 nm, Del. Impr: 54%
technologies/components
28
* “ASU Predictive Technology Model.” [Online]. Available:
http://www.eas.asu.edu/~ptm/
28
Three-Dimensional (3-D)
Integrated Circuits and Systems
• Develop design methodologies
for 3-D ICs
• New models are required to
consider the third physical
dimension
• Diverse technologies
– SiP, interposer, TSVs
• Many challenges exist down the
road!!!
– Be the first to address them 
• Opportunities to tape-out do
exist!
– CMP/Tezzaron - cmp.imag.fr
Xilinx FPGA
Virtex 7
– Cadence PDK - 3-D Encounter
29
A New Circuit Design
Paradigm (Safe Projects )
• (Re-)Design and assess
SpiNNaker-based 3-D
architectures
– Power, area, performance,
cost/yield
– Interposer and TSVs
technologies
• Research methodology
– Use available resources
– Differentiate only where
required
• Other topics
– Can resonance improve energy
efficiency of GALS based
architectures?
– Design for manufacturability
for GALS systems 2-D/3-D
• Considering process, voltage, and
temperature (PVT) variations
• PVT behavior is substantially
different in 3-D systems
 Develop/extend CAD tools
for the physical design of 3D systems
– Special focus on interposer
technologies
30
3-D Integration as a System
Integration Approach
(High-Return Projects)
• Heterogeneous 3-D integration
– Preached a lot but not
explored (at all)!
• Memory on logic is a single
application
• Develop techniques and
methods for “Mix-and-Match”
systems
– How do you model…?
– How do evaluate…?
– How do you integrate…?
– How do you manufacture…?
• The physical proximity of
diverse systems may not
come for free!
 Interdisciplinary research is a
prerequisite for such systems
 Rather application driven
31
31
PhD Guidelines
 PhD is NOT an end in itself but a means to end!
• Persistence, Persistence,
•
•
•
•
Persistence!
Manage rejection
Be there early!
Citations value more than
publications
Presentation and writing skills
32
32
Asynchronous Logic
Design Tools
[Doug Edwards,]
Jim Garside,
Steve Furber,
Alasdair Rawsthorne
33
Previous Projects
• Balsa
– world-leading public asynchronous
synthesis tool
– used for complete microprocessors
• SEDATE
– delay Insensitive datapath synthesis
• GALSA
– framework for heterogeneous GALS
• ...
34
GAELS
• Globally Asynchronous Elastic
Logic Synthesis
– modern SoCs comprise numerous,
semi-autonomous subsystems
– shrinking transistors have hard-topredict variations
• Address using Elastic Logic
– new, delay tolerant paradigm
– new project!
35
Reconfigurable
Processing
Jim Garside
36
Current Computing
• Energy use is a problem
• Software
– offers processing flexibility
– highly inefficient – big overheads
• Hardware
– limited programmability
– greater efficiency
– expensive to develop
37
A Solution?
• Compile an algorithm into a
mixture of hardware and software
– how to partition the 'code'?
– dynamic adaptation
• Existing solutions tend towards
static partitioning
– require wide skills from developers
– sacrifice potential flexibility
– intolerant of differing hardware
38
Dynamic
Reconfiguration
• Keep algorithm in common 'object'
format
• Identify, 'compile' and run
repeating sections in available
hardware
• Adapt to facilities of any given chip
– allow for future portability
39
To date ...
• Can identify critical loops and
recompile them to hardware
– using pre-existing code
• Developing tool flow
• Have reasonable reconfigurable
hardware architecture
Results
• Promising – not 'earth shattering'
40
Future
• Want:
• Means of expressing algorithms
allowing easy compilation into
software or hardware
• Extract/exploit sensible parallelism
– 'fine grain' for hardware
– 'coarse grain' (?) for software
• Get (some of) the available
speed/power efficiency
41
Mobile Systems
Architecture
Nick Filer
with help from
Barry Cheetham
42
Nick Filer
• Interests:
– Wireless networks of all types. Mainly:
• Ad-hoc,
• Voice over IP,
• Sensors (data collection) ,
• Pocket networks (e.g. mobile phones, PDAs),
• Information dissemination.
– Supported by:
• Simulation, analysis, software generation tools.
– eLearning tools for science.
43
Current Interest - 1
• Pocket Networks
– Based on clusters of mobile users.
– Person to person transport.
– What applications are useful, will work,
when and how will applications work?
• Voice?
• Video?
• Delay tolerant text messages?
44
Current Interest - 2
• Low power Wireless Sensor Networks
– Algorithms for reduced power usage,
mainly getting it low by design.
– Intelligent transport/routing protocols
driving low power packet routing.
– Smart dust:
• Current cost $100+, needs to be cheaper.
• Ultra-low power (NEW): processor, memory,
design.
• Nano scale. E.g. for use down oil wells!
45
Current Interest – 3
• Hand-over in mobile wireless networks.
– Pretty much solved problem (even if not always
ideal) for mobile phones.
– Close to solutions for WiFi, WiMAX, Bluetooth,
Zigbee etc. Still lots to learn though.
– Currently 3 layer hierarchy – infrastructure 
Wide Area  Personal Area.
– What happens with more layers?
• Macro scale to nano scale?
• Fixed infrastructure interacting with mobile
autonomous agents?
• Just how inefficient are these mechanisms currently?
46
Current Interest - 4
• Information dissemination in mobile
ad-hoc networks.
– P2P technologies.
– P2P optimization for task, availability,
handover, low energy, access latency…
– P2P to aid DNS like queries (information
retrieval) in mobile, changing topology
networks.
– Delay tolerant P2P. Opportunistic
communications e.g. send 100,000
sensors down an oil well, get 1 back, what
does it know? Own data, others data?
47
Joint with Barry Cheetham
Current Interest - 5
• Real time distributed systems (sound and video)
– Internet choir
• Very tight audio constraints (max 50ms)
• Demands of latency & bandwidth
– Singing together
• Less constrained internet choir but synchronization very
difficult.
– Broadcast simulcasts
• Mixed video and sound from various locations.
• Broadcast over multiple media types with different delay
etc. characteristics.
– Major Obstacles:
• Media types and standards, protocols, congestion, error
handling, signal processing, links to hand-over problems
....
48
Current Interest - 6
• Support for adaptable network stacks
– Writing or changing software is time
consuming, error prone, …
– Models can capture semantics of
software: Purpose, usage,
transformation knowledge ...
– Hence: Use models to generate
implementations.
• Use in teaching/learning, simulation,
network stack implementation.
– Support for adaptable network stacks
49
Joint with Barry Cheetham
Current Interest – 7
• eLearning for Complex Systems
– Most eLearning tools you have seen are not
much more Content Management Systems.
– There is currently little or no evidence they
improve student grades!
– We have on-going work looking at improving
understanding of wireless systems.
– Also, interested in science teaching for
awkward adolescents.
50
Arithmetic and
Control Theory
Dave Lester
51
Arithmetic and Control
Theory
• Exact Arithmetic
– NASA/Boeing
• Correctness of Control Theory
Applications
– Airbus
• Formalisation and Mechanisation
of Probabilistic Reasoning
52

similar documents