Jiashi Feng

Report
Learning Scalable Discriminative
Dictionaries with Sample Relatedness
a.k.a.
“Infinite Attributes”
Jiashi Feng, Stefanie Jegelka,
Shuicheng Yan, Trevor Darrell
Attribute Learning
striped
water
white
furry
bright
wheels
• generalizable vs. discriminative
• which attributes to use?
(Lampert, Nickisch & Harmeling, 2009; Farhadi &Forsyth, 2009; Parikh & Grauman, 2011 …)
Attribute Generative Model Cartoon
cup
face
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
cup
face
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
mouth
mouth
attributes
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
attributes
mouth
mouth
features
…edges…
Attribute Generative Model Cartoon
face
cup
car
…
objects
eye
eye
eye
eye nose
mouth nose eye
nose
eye
attributes
mouth
mouth
features
…edges…
Goals I
• Flexibility: automatically determine the attributes
– as expressive as needed, as compact as possible
 non-parametric Bayesian
striped
water
white
Animals
furry
striped
water
white
Humans
furry
?
Goals II
• Efficiently learnable: few positive training samples
– reduce sample complexity  Related samples
Samoyed dog
Pug dog
Knowledge transfer
via attributes
Corgi dog
Goals III
• Discriminative: object classification task
 max margin
+
+
+
-
-
+
+
-
Outline
• Non-parametric Bayesian for flexible
attribute learning
• Sample relatedness for knowledge
transfer
• Discriminative generative model
Preliminaries: Non-parametric
Bayesian
• Bayesian rule applied in machine learning
likelihood of
prior probability of
posterior of given
• Model comparison for model selection:
• Prediction:
Non-parametric Bayesian Models
• Inflexible models yield unreasonable inferences.
• Non-parametric models can automatically infer an adequate
model size/complexity from the data, without needing to
explicitly do Bayesian model comparison.
• Many can be derived by starting with a finite parametric
model and taking the limit as number of parameters
Finite Mixture Model
•
•
•
•
Set of observations:
Constant clusters,
Cluster assignment for
is
The probability of each sample:
• The likelihood of samples:
Infinite Mixture Model
• Infinite clusters likelihood
– It is like saying that we have:
• Since we always have limited samples in reality, we
will have limited number of clusters used; so we
define two sets of clusters:
–
–
numbers of classes for which
numbers of classes for which
• Assume a reordering, such that
K
K0
Finite Feature Model
• Generating :
binary matrix
• Distribution of
Integrate out
, leaving:
customers
– For each column , draw
from beta distribution
– For each customer, flip a coin by
features
Finite Feature Model
• Generating :
binary matrix
•
is sparse
customers
– For each column , draw
from beta distribution
– For each customer, flip a coin by
Even
, the matrix is expected to have a finite
Number of non-zero elements.
features
From Finite to Infinite Binary Matrices
• A technical difficulty: the probability for any particular matrix
goes to zero as
• However, if consider equivalent classes of matrices in leftordered form obtained by reordering the columns:
–
is the number of features assigned
–
is the th harmonic number
– This distribution is exchangeable, independent of the ordering.
From Finite to Infinite Binary Matrices
a) The binary matrix on the left is transformed to the
binary matrix on the right by the function lof().
b) A left-ordered binary matrix generated by Indian
Buffet Process.
customers
Indian
Buffet
Process
Buffet dishes
“Many Indian restaurants offer
lunchtime Buffets with an apparently
infinite number of dishes.”
• First customer starts at the left of the buffet, and takes a
serving from each dish, stopping after a
number of
dishes as her plate becomes overburdened.
• The i-th customer moves along the buffet, sampling dishes in
proportion to their popularity, with probability
, and
trying a
number of new dishes.
Non-parametric Learning
Infinite attributes – Indian Buffet Process prior
• prob (image n samples attribute k)
• sample
new attributes
striped
white
• Likelihood:
, -
…
-
wheels
striped
furry
, bright
wheels
(Griffiths & Ghahramani, 2006)
Asymptotic Model
• prob(image n samples attribute k)
• sample
new attributes
striped
• Asymptotics ¾
2
!
0
white
furry
data
binary
assignments
dictionary
dictionary size
is determined automatically
(Broderick, Kulis, Jordan ICML 2013)
bright
wheels
Asymptotics
Mixture of
Gaussians
Bayesian,
nonparametric
DP
mixture
flexible,
principled
cov  zero
simple,
efficient,
“practical”
k-means
Principled discrete criteria from BNP:
• Dirichlet Process  k-means + penalty
• Beta Process  squared loss + penalty
• Dependent Dirichlet Process
??
(Kulis & Jordan, ICML 2012)
(Broderick, Kulis & Jordan, ICML 2013)
(Campbell, Liu, Kulis, How, Carin, NIPS 2013)
Sample Relatedness
polar bear
Related samples
?
clown fish
path length in WordNet
?
motorbike
(Christiane Fellbaum, WordNet, 1998)
Full Model
+
Classifiers
+
+
Samoyed dog Cat
…
Attributes
Input features
discriminative
Pug dog
Pug Dog
Positive Samples
-
…
Samoyed Dog
Cat
Related Samples
Negative Samples
attributes with
sample relatedness
Joint Learning of Dictionary & Classifiers
• BCD: alternatingly update classifiers & dictionary
w h i l e not conver ged d o
1: up dat e z i k 2 f 0; 1g gr eedil y
2: A ! X Z > ( Z Z > ) ¡ 1
3: sam p le a new at t r i but e a k + 1 :
p( a k + 1 = x i ) / kx i ¡ A z i k 2
4: A Ã [A ; a k + 1 ]
en d
Does It Work?
classification accuracy on ImageNet
sample-efficient:
higher accuracy with fewer
training samples
generalization
better representation of
new classes
AwA data
Why Does It Work?
more related information
more data
15
20
25
30
50
using related samples increases sample-efficiency
Why Does It Work?
non-parametric: adapts to complexity of the data
 representation-efficient
Conclusions
• Flexible attribute learning method
– generalize to new categories
– adaptive to the dataset complexity
• Efficiently learnable
– sample efficiency
– reduce the user annotation effort
• Perform Well
– recognize existing and new categories well
Thanks!
Q&A

similar documents