What is belief propagation? - Intel Labs University Research

Report
ISTC-EC @ Cornell
Accelerating Belief Propagation
in Hardware
Skand Hurkat and José Martínez
Computer Systems Laboratory
Cornell University
http://www.csl.cornell.edu/
The Cornell Team
ISTC-EC @ Cornell
• Prof. José Martínez (PI), Prof. Rajit Manohar
@ Computer Systems Lab
• Prof. Tsuhan Chen
@ Advanced Multimedia Processing Lab
• MS/Ph.D. students
– Yuan Tian, MS ’13
– Skand Hurkat
– Xiaodong Wang
The Cornell Graph
ISTC-EC @ Cornell
Rob Rutenbar
UIUC
Intel
Center
Eriko Nurvithadi
Ithaca
Limor Fix
Ravi Iyer
Oregon
Omesh Tickoo
Boris Ginsburg
Haifa
Cornell's
ISTC-EC
Graph
Carlos Guestrin
Washington
Ronny Ronen
Yoav Talgam
CMU
James Hoe
The Cornell Project
ISTC-EC @ Cornell
• Provide hardware accelerators for belief
propagation algorithms on embedded SoCs
(retail/car/home/mobile)
– High speed
Graph
– Very low power
– Self-optimizing
– Highly programmable
Inference
Algorithm
BP Accelerator within
SoC
Result
What is belief propagation?
ISTC-EC @ Cornell
Belief propagation is a message passing
algorithm for performing inference on graphical
models, such as Bayesian networks or Markov
Random Fields
What is belief propagation?
ISTC-EC @ Cornell
•
•
•
•
Labelling problem
Energy as a measure of convergence
Minimize energy (MAP label estimation)
Exact results for trees
– Converges in exactly two iterations
• Approximate results for graphs with loops
– Yields “good” results in practice
• Minimum over large neighbourhoods
• Close to optimal solution
Not all “that” alien to embedded
ISTC-EC @ Cornell
Remember the Viterbi algorithm?
• Used extensively in digital communications
0
11
21
31
41
12
1
22
2
32
3
42
4
13
23
33
43
5
What does this mean?
ISTC-EC @ Cornell
• Every mobile device uses Viterbi decoders
– Error correction codes (eg: turbo codes)
– Mitigating inter-symbol interference (ISI)
• Increasing number of mobile applications
involve belief propagation
– More general belief propagation accelerators can
greatly improve user experience with mobile
devices
Target markets
ISTC-EC @ Cornell
Retail/Car/Home/Mobile
• Image processing
–
–
–
–
De-noising
Segmentation
Object detection
Gesture recognition
• Handwriting recognition
– Improved recognition
through context
identification
• Speech recognition
– Hidden Markov models are
key to speech recognition
Servers
• Data mining tasks
– Part-of-speech tagging
– Information retrieval
– “Knowledge graph” like
applications
• Machine learning based
tasks
– Constructive machine
learning
– Recommendation systems
• Scientific computing
– Protein structure inference
Hardware accelerator for BP
ISTC-EC @ Cornell
Inference
Algorithm
Graph
BP Accelerator within SoC
Result
Work done so far
ISTC-EC @ Cornell
Software
• General purpose MRF
inference library
– Support for arbitrary graphs
– Floating point math
– Parallel techniques for faster
inference
• Library optimized for grid
graphs
– Optimized data structures
– Template can use any data type
– Multiple inference techniques
optimized for early vision
– Stereo matching in ∼200 ms
Hardware
• High level synthesis of
message update unit
– Vivado HLS (C-to-gates) tool
used to synthesize message
update unit on ZedBoard
– ∼2x improvement in inference
speed on CPU+FPGA compared
to CPU-only inference
– Fixed point math
• GraphGen collaboration
– On-going work
– Stereo matching task mapped
to multiple platforms
– ∼10x speedup on GPU w.r.t.
CPU only implementation
Work done so far
ISTC-EC @ Cornell
Software
• General purpose MRF
inference library
– Support for arbitrary graphs
– Floating point math
– Parallel techniques for faster
inference
• Library optimized for grid
graphs
– Optimized data structures
– Template can use any data type
– Multiple inference techniques
optimized for early vision
– Stereo matching in ∼200 ms
Hardware
• High level synthesis of
message update unit
– Vivado HLS (C-to-gates) tool
used to synthesize message
update unit on ZedBoard
– ∼2x improvement in inference
speed on CPU+FPGA compared
to CPU-only inference
– Fixed point math
• GraphGen collaboration
– On-going work
– Stereo matching task mapped
to multiple platforms
– ∼10x speedup on GPU w.r.t.
CPU only implementation
Work done so far
ISTC-EC @ Cornell
Software
• General purpose MRF
inference library
– Support for arbitrary graphs
– Floating point math
– Parallel techniques for faster
inference
• Library optimized for grid
graphs
– Optimized data structures
– Template can use any data type
– Multiple inference techniques
optimized for early vision
– Stereo matching in ∼200 ms
Hardware
• High level synthesis of
message update unit
– Vivado HLS (C-to-gates) tool
used to synthesize message
update unit on ZedBoard
– ∼2x improvement in inference
speed on CPU+FPGA compared
to CPU-only inference
– Fixed point math
• GraphGen collaboration
– On-going work
– Stereo matching task mapped
to multiple platforms
– ∼10x speedup on GPU w.r.t.
CPU only implementation
Hierarchical belief propagation
ISTC-EC @ Cornell
Results – Stereo Matching
ISTC-EC @ Cornell
Comparing inference algorithms on “Tsukuba”
benchmark
E
n
e
r
g
y
480000
475000
470000
465000
460000
455000
450000
445000
440000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
0
U
p
d
a
t
e
s
Updates
Energy
Work done so far
ISTC-EC @ Cornell
Software
• General purpose MRF
inference library
– Support for arbitrary graphs
– Floating point math
– Parallel techniques for faster
inference
• Library optimized for grid
graphs
– Optimized data structures
– Template can use any data type
– Multiple inference techniques
optimized for early vision
– Stereo matching in ∼200 ms
Hardware
• High level synthesis of
message update unit
– Vivado HLS (C-to-gates) tool
used to synthesize message
update unit on ZedBoard
– ∼2x improvement in inference
speed on CPU+FPGA compared
to CPU-only inference
– Fixed point math
• GraphGen collaboration
– On-going work
– Stereo matching task mapped
to multiple platforms
– ∼10x speedup on GPU w.r.t.
CPU only implementation
Work done so far
ISTC-EC @ Cornell
Software
• General purpose MRF
inference library
– Support for arbitrary graphs
– Floating point math
– Parallel techniques for faster
inference
• Library optimized for grid
graphs
– Optimized data structures
– Template can use any data type
– Multiple inference techniques
optimized for early vision
– Stereo matching in ∼200 ms
Hardware
• High level synthesis of
message update unit
– Vivado HLS (C-to-gates) tool
used to synthesize message
update unit on ZedBoard
– ∼2x improvement in inference
speed on CPU+FPGA compared
to CPU-only inference
– Fixed point math
• GraphGen collaboration
– On-going work
– Stereo matching task mapped
to multiple platforms
– ∼10x speedup on GPU w.r.t.
CPU only implementation
GraphGen synthesis of BP-M
ISTC-EC @ Cornell
• BP-M update (logspace messages)
implemented using GraphGen
(Intel/CMU/UW)
• GPU implementation ∼10x faster than CPU
based implementation
• On-going work on FPGA based
implementation and on implementing
hierarchical update
Cornell Publications (2013 only)
ISTC-EC @ Cornell
•
•
•
•
3x Comp. Vision & Pattern Recognition (CVPR)
3x Asynchronous VLSI (ASYNC)
2x Intl. Symp. Computer Architecture (ISCA)
1x Intl. Conf. Image Processing (ICIP)
• 1x ASPLOS (w/ GraphGen folks, under review)
Year 3 Plans
ISTC-EC @ Cornell
• GraphGen extensions for BP applications
– Multiple inference techniques
• Extraction of “BP ISA”
– Ops on arbitrary graphs
– Efficient representation
• Amplification work on UAV ensembles
– Self-optimizing, collaborative SoCs
• One-day “graph” workshop with
GraphGen+UIUC
ISTC-EC @ Cornell
Accelerating Belief Propagation
in Hardware
Skand Hurkat and José Martínez
Computer Systems Laboratory
Cornell University
http://www.csl.cornell.edu/

similar documents