Boltzmann Machines and their Extensions

Report
Boltzmann Machines and their Extensions
S. M. Ali Eslami
Nicolas Heess
John Winn
March 2013
Heriott-Watt University
Goal
Define a probabilistic distribution on images like this:
2
What can one do with an ideal shape model?
Segmentation
3
Weizmann horse dataset
Sample training images
327 images
4
What can one do with an ideal shape model?
Image
5
What can one do with an ideal shape model?
Computer graphics
6
Energy based models
Gibbs distribution
7
Shallow architectures
Mean
8
Shallow architectures
MRF
9
Existing shape models
Most commonly used architectures
Mean
MRF
sample from the model
sample from the model
10
What is a strong model of shape?
We define a strong model of object shape as one which
meets two requirements:
Realism
Generalization
Generates samples
that look realistic
Can generate samples that
differ from training images
Training images
Real distribution
Learned distribution
11
Shallow architectures
HOP-MRF
12
Shallow architectures
RBM
13
Shallow architectures
Restricted Boltzmann Machines
The effect of the latent variables can be appreciated by
considering the marginal distribution over the visible units:
14
Shallow architectures
Restricted Boltzmann Machines
In fact, the hidden units can be summed out analytically.
The energy of this marginal distribution is given by:
where
15
Shallow architectures
Restricted Boltzmann Machines
All hidden units are conditionally independent given the
visible units and vice versa.
16
RBM inference
Block-Gibbs MCMC
17
RBM inference
Block-Gibbs MCMC
18
RBM learning
Stochastic gradient descent
Maximize
with respect to
19
RBM learning
Contrastive divergence
Getting an unbiased sample of the second term, however is
very difficult. It can be done by starting at any random state
of the visible units and performing Gibbs sampling for a
very long time. Instead:
20
RBM inference
Block-Gibbs MCMC
21
RBM inference
Block-Gibbs MCMC
22
RBM learning
Contrastive divergence
• Crudely approximating the gradient of the log probability
of the training data.
• More closely approximating the gradient of another
objective function called the Contrastive Divergence, but
it ignores one tricky term in this objective function so it is
not even following that gradient.
• Sutskever and Tieleman have shown that it is not
following the gradient of any function.
• Nevertheless, it works well enough to achieve success in
many significant applications.
23
Deep architectures
DBM
24
Deep architectures
Deep Boltzmann Machines
25
Deep architectures
Deep Boltzmann Machines
Conditional distributions remain factorised due to layering.
26
Shallow and Deep architectures
Modeling high-order and long-range interactions
MRF
RBM
DBM
27
Deep Boltzmann Machines
DBM
• Probabilistic
• Generative
• Powerful
Typically trained with many examples.
We only have datasets with few training examples.
28
From the DBM to the ShapeBM
Restricted connectivity and sharing of weights
DBM
ShapeBM
Limited training data, therefore reduce the number of parameters:
1.
2.
3.
Restrict connectivity,
Tie parameters,
Restrict capacity.
29
Shape Boltzmann Machine
Architecture in 2D
Top hidden units capture object pose
Given the top units, middle hidden
units capture local (part) variability
Overlap helps prevent discontinuities
at patch boundaries
30
ShapeBM inference
Block-Gibbs MCMC
image
reconstruction
sample 1
sample n
Fast: ~500 samples per second
31
ShapeBM learning
Stochastic gradient descent
Maximize
with respect to
1. Pre-training
• Greedy, layer-by-layer, bottom-up,
• ‘Persistent CD’ MCMC approximation to the gradients.
2. Joint training
• Variational + persistent chain approximations to the gradients,
• Separates learning of local and global shape properties.
~2-6 hours on the small datasets that we consider
32
Results
Sampled shapes
Evaluating the Realism criterion
FA
Incorrect generalization
RBM
Failure to learn variability
ShapeBM
Data
Weizmann horses – 327 images – 2000+100 hidden units
Natural shapes
Variety of poses
Sharply defined details
Correct number of legs (!)
34
Sampled shapes
Evaluating the Realism criterion
Weizmann horses – 327 images – 2000+100 hidden units
This is great, but has it just overfit?
35
Sampled shapes
Evaluating the Generalization criterion
Weizmann horses – 327 images – 2000+100 hidden units
Sample from
the ShapeBM
Closest image in
training dataset
Difference between
the two images
36
Interactive GUI
Evaluating Realism and Generalization
Weizmann horses – 327 images – 2000+100 hidden units
37
Further results
Sampling and completion
Caltech motorbikes – 798 images – 1200+50 hidden units
Training
images
ShapeBM
samples
Sample
generalization
Shape
completion
38
Constrained shape completion
Evaluating Realism and Generalization
ShapeBM
NN
Weizmann horses – 327 images – 2000+100 hidden units
39
Further results
Constrained completion
ShapeBM
NN
Caltech motorbikes – 798 images – 1200+50 hidden units
40
Imputation scores
Quantitative comparison
Weizmann horses – 327 images – 2000+100 hidden units
1.
Collect 25 unseen horse silhouettes,
2.
Divide each into 9 segments,
3.
Estimate the conditional log probability of
a segment under the model given the rest
of the image,
4.
Average over images and segments.
Score
Mean
RBM
FA
ShapeBM
-50.72
-47.00
-40.82
-28.85
41
Multiple object categories
Simultaneous detection and completion
Caltech-101 objects – 531 images – 2000+400 hidden units
Train jointly on 4 categories without knowledge of class:
Shape
completion
Sampled
shapes
42
What does h2 do?
Multiple categories
Class label information
Accuracy
Weizmann horses
Pose information
Number of training images
43
What does h2 do?
44
What does the overlap do?
45
Summary
• Shape models are essential in applications such as
segmentation, detection, in-painting and graphics.
• The ShapeBM characterizes a strong model of shape:
– Samples are realistic,
– Samples generalize from training data.
• The ShapeBM learns distributions that are qualitatively
and quantitatively better than other models for this task.
46
Questions
MATLAB GUI available at
http://arkitus.com/Ali/

similar documents