Support Vector Machine Active Learning for Image Retrieval

Support Vector Machine Active
Learning for Image Retrieval
Author: Simon Tong & Edward Chang
Presented By:
Navdeep Dandiwal
 Motivation
 Introduction
 Version Space
 Active Learning
 Image Characterization
 Experimental Data
 Conclusions
 Relevance feedback is often a critical component
when designing image databases.
 Interactively determines a user’s desired output
by asking user to label images
Can it get boring for the user?
How to create effective relevance feedback?
The proposed use of Support Vector Machine
Active Learning Algorithm is: Effective relevance feedback by grasping user’s
query concept accurately and quickly, while asking
to label small number of images
 Selects the most informative images to query a
 Quickly learns boundary that separates the images
that satisfy the user‘s query concept from the rest
of the dataset
 User should be able to implicitly inform a
database of his or her desired output or query
 Relevance feedback can be used as a query
Refinement Scheme to learn user query concept
 Based on the answers, another image set is
brought up for user to label
We call such refinement scheme as
query concept learner
Refinement Scheme
Fetches few image instances
User labels each instance
Relevant images
Irrelevant images
 Most machine learning
algorithms are passive
 Passive in the sense
that they are generally
applied using
randomly selected
training set
 Key idea of active
 It should be able to
choose its next poolquery based upon the
past answers to
previous pool-queries
Support Vector Machine Active
Learner(SVMActive). It works on following ideas:
 Similar to learning SVM binary classifier where a
hyperplane separates relevant and irrelevant
images in a projected space.
 Learns the classifier quickly via active learning
 Returns top-k most relevant images. These are
the ones farthest from the hyperplane
Support Vector Machines
 In their simplest forms, SVMs are hyperplanes
that separate the training data by maximal
 All vectors on one side of hyperplane are labeled
as ‘-1’ and on the other side as ‘1’
 Training instances that lie closest to the
hyperplanes are called support vectors
Support Vector Machines(contd…)
- Support vectors
Given training data {x1 . . . xn} that are vectors in some space
We also give their labels {y1 . . . yn} where yi {-1,1}
Support Vector Machines(contd…)
 SVMs allow one to project the original training
data in space to a higher dimensional feature
space via a Mercers kernel operator K.
 When
we classify x as +1, otherwise as -1
Support Vector Machines(contd…)
Support Vector Machines(contd…)
 When K satisfies Mercer’s condition it can be
written as
and “.” denotes inner product. We can write f as:
 Thus by using K we are implicitly projecting the
training data into a different (often higher
dimensional) space F
Version Space
Version Space(contd…)
 Given a labeled training data and a Mercer
kernel K , then the set of consistent hyperplanes
that separate the data in the induced feature
space is called the version space
 Our set of possible hypothesis is given as:
 Where parameter space
is simply equal to
Version Space(contd…)
 The version space,
is defined as:
 Notice that is a set of hyperplanes, there is a
exact correspondence between unit vectors w
and hypothesis f in . Thus we will redefine
Version Space(contd…)
 SVMs find the hyperplane that maximizes the
margin in feature space . One way to pose this
as follows:
subject to:
Cause solution
to lie in version
Version Space(contd…)
 We want to find the point in the version space
that maximizes the minimum distance to any of
the delineating hyperplanes.
Largest sphere whose
center lies in version space
and whose surface does not
intersect with the
It’s center corresponds
to SVM and radius is
the margin of SVM in
feature space
Active Learning
 In pool based active learning we have a pool of
unlabeled instances
 Instances x are independently and identically
distributed according to underlying function
 Labels are distributed according to some
conditional distribution P(y|x)
Active Learning(contd…)
 Given unlabeled pool U
Active learner l: (f, q, X)
Classifier f
Querying function q(X)
Labeled data X
Given current labeled set X,
decides which instance in U to
query next
Can also return a classifier f
after each or fixed number of
Difference between
active and passive
Active Learning(contd…)
 How to choose the next unlabeled instance in the
pool to query?
 Use approach that queries points so as to
attempt to reduce the size of the version space as
much as possible
Active Learning(contd…)
-The surface of the hypersphere represents unit weight vectors
-Each of the two hyperplanes corresponds to a labeled training instance
-Version space is the surface segment closest to the camera
Active Learning(contd…)
-A large sphere could be embedded
-The center of this sphere lies in version space and surface does not
intersect with the hyperplanes
-Center is SVM, radius is margin
Active Learning(contd…)
 Reduce version space as fast as possible by
choosing a pool-query that halves V
Next pool-query
Unlabeled instances
Labeled instances
hypersphere that
fits inside
version space
Version space
Active Learning(contd…)
 SVMActive takes simple approach chooses pool
query of twenty images closest to its separating
 It can be unstable during first round of RF
 Therefore choose random images for the first
SVMActive Algorithm
Learn SVM on
current labeled data
Is it first
Ask user to label 20
pool images closest to
SVM boundary
Ask user to label 20
randomly selected images
Learn final SVM on
labeled data
Display top-k relevant
images, farthest from SVM
After relevance
feedback rounds
Image Characterization
 Our system employs a multi-resolution image
representation scheme.
 In this scheme, we characterize images by two
main features:
 Color
 Texture
 We consider shape as the attribute of these main
Image Characterization(contd…)
Multi-resolution Color Features
Image Characterization(contd…)
Multi-resolution Texture Features
 Three characterizing texture features:
 Structuredness
 Orientation
 Scale
 Discrete Wavelet Transformation (DWT) using
quadrature mirror filters because of its
computational efficiency
Image Characterization(contd…)
Multi-resolution Texture Features
Image Characterization(contd…)
144 dimensional vector
 Space for SVMActive is a 144 dimensional space
 Each image in database corresponds to a point
in this space
 4-category; 10-category; 15-category datasets
 To enable objective measure of performance, it
is assumed that a query concept was an image
 Accuracy is computed by looking at the fraction
of the k returned result that belongs to the target
image category
 All SVM algorithms require at least one relevant
and one irrelevant image to function
4-category set
10-category set
15-category set
 SVMActive displays 20 images per pool-querying
 The trade-off
Number of
images in one
Keeping it constant
Number of
20 random +
2 rounds of
20 random +
1 rounds of
Because active learner has more
control and freedom to adapt when
asking two rounds of 10 images
than one round of 20 images
20 random +
2 rounds of
20 random +
2 rounds of
-Increase in cost of asking 20
images per round to user is
negligible, since user can pick out
relevant images easily
-Virtually no additional
computational cost in calculating
the 20 images to query
 SVMActive displays 20 images per pool-querying
 The trade-off
Number of
images in one
Keeping it constant
Number of
conduct more
Active and regular passive learning on 15-category dataset
After three rounds of querying
After five rounds of querying
Average top-50 accuracy over the 4category data set using a regular SVM
trained on 30 images
Accuracy on 4-category data set after three
querying rounds using various kernels
Scheme comparison
Other Schemes(QPM; QEX)
 Traditional information retrieval schemes
require a large number of image instances to
achieve any substantial refinement
 Tend to be fairly localized in their exploration of
the image space and hence rather slow in
exploring the entire space
Scheme comparison
 During relevance feedback, it takes both the
relevant and irrelevant images into account
when choosing the next pool-queries
 Chooses to ask user to label images that are
regarded most informative for leaning the query
concept, rather than relying on the likelihood of
being relevant
Average top-k accuracy over the 15-category dataset
In a nut shell the contributions of this study are:
 SVMActive can produce a well suited learner
that significantly outperforms traditional
 Organizing image features in different
resolutions gives learner the flexibility to
model subjective perception and to satisfy a
variety of search tasks.
 Running time of SVMActive algorithm scales
linearly with the size of image database.
 Subsampling databases – using few thousand
images as pool with which to query user
 Designing methods to seed algorithms
 It would be beneficial to make SVMActive
independent of having a starting relevant image

similar documents