Large-Scale Object Recognition with Weak Supervision

Report
Large-Scale Object Recognition with
Weak Supervision
Weiqiang Ren, Chong Wang, Yanhua Cheng,
Kaiqi Huang, Tieniu Tan
{wqren,cwang,yhcheng,kqhuang,[email protected]
Task2 : Classification + Localization
Task 2b: Classification + localization with additional
training data
— Ordered by classification error
1. Only classification labels are used
2. Full image as object location
Outline
• Motivation
• Method
• Results
Motivation
Why Weakly Supervised Localization (WSL)?
Knowing where to look, recognizing objects will be easier !
However, in the classification-only task, no annotations of
object location are available.
Weakly Supervised
Localization
Current WSL Results on VOC07
40
31.6
35
30
26.2
22.4
25
22.7
20
15
10
5
0
13.9
15.0
26.4
33.7
13.9: Weakly supervised object detector learning with model drift detection, ICCV 2011
15.0: Object-centric spatial pooling for image classification, ECCV 2012
22.4: Multi-fold mil training for weakly supervised object localization, CVPR 2014
22.7: On learning to localize objects with minimal supervision, ICML 2014
26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak
Supervision, submitted to TPAMI
26.4: Weakly supervised object detection with posterior regularization, BMVC 2014
31.6: Weakly supervised object localization with latent category learning, ECCV 2014
Sep 11, Poster Session 4A, #34
Our Work
VOC 2007
Results
VOC 2007
Results
Ours
31.6
Ours
26.2
DPM 5.0
33.7
Weakly Supervised Object Localization
with Latent Category Learning
ECCV 2014
DPM 5.0
33.7
Discovering Visual Objects in Large-scale
Image Datasets with Weak Supervision
Submitted to TPAMI
For the consideration of high efficiency in large-scale tasks, we use
the second one.
Method
Framework
2
Det Prediction
3
Rescoring
4
Cls Prediction
…
Conv Layers
Input Images
FC Layers
1
1st : CNN Architecture
Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets
2nd: MILinear SVM
MILinear : Region Proposal
Good region proposal algorithms
High recall
High overlap
Small number
Low computation cost
MCG pretrained on VOC 2012
Additional Data
Training: 128 windows/ image
Testing: 256 windows/image
Compared to Selective Search (~2000)
MILinear: Feature Representations
• Low Level Features
– SIFT, LBP, HOG
– Shape context, Gabor, …
• Mid-Level Features
– Bag of Visual Words (BoVW)
• Deep Hierarchical Features
– Convolutional Networks
– Deep Auto-Encoders
– Deep Belief Nets
MILinear: Positive Window Mining
• Clustering
– KMeans
• Topic Model
– pLSA, LDA, gLDA
• CRF
• Multiple Instance Learning
–
–
–
–
DD, EMDD, APR
MI-NN,
MI-SVM, mi-SVM
MILBoost
MILinear: Objective Function and Optimization
• Multiple instance Linear SVM
• Optimization: trust region Newton
– A kind of Quasi Newton method
– Working in the primal
– Faster convergence
MILinear: Optimization Efficiency
3rd: Detection Rescoring
• Rescoring with softmax
128 boxes
max
…
1000 dim
1000 classes
Softmax: consider all the categories
simultaneously at each minibatch of the
optimization – Suppress the response of
other appearance similar object
categories
train
softmax
…
1000 dim
th
4 :
Classification Rescoring
• Linear Combination
Scls   Scls  (1   )SWSL
…
1000 dim
…
1000 dim
One funny thing: We have
tried some other strategies of
score combination, but it
seems not working !
…
1000 dim
Results
1st: Classification without WSL
Method
Top 5 Error
Baseline with one CNN :
13.7
Average with four CNNs:
12.5
2nd: MILinear on ImageNet 2014
Methods
Detection Error
Baseline (Full Image)
61.96
MILinear
40.96
Winner
25.3
2nd: MILinear on VOC 2007
2nd: MILinear on ILSVRC 2013 detection
mAP: 9.63%! vs 8.99% (DPM5.0)
nd
2 :
MILinear for Classification
Methods
Top 5 Error
Milinear
17.1
rd
3 :
WSL Rescoring (Softmax)
Method
Baseline with one CNN :
Top 5 Error
13.7
Average with four CNN :
12.5
MILinear
17.1
MILinear + Rescore
13.5
The Softmax based rescoring successfully suppresses the
predictions of other appearance similar object categories !
4th: Cls and WSL Combinataion
Scls   Scls  (1   )SWSL
Method
Baseline with one CNN model:
Top 5 Error
13.7
Average with four CNN models:
12.5
MILinear
17.1
MILinear + Rescore
13.5
Cls (12.5) + MILinear (13.5)
11.5
WSL and Cls can be complementary to each other!
Russakovsky et al. ImageNet Large Scale Visual Object Challenge.
Conclusion
• WSL always helps classification
• WSL has large potential: WSL data is cheap
Thank You!

similar documents