```AdaBoost
1
Classifier
• Simplest classifier
2
3
• (Adaptive Boosting, R. Scharpire, Y. Freund,
ICML, 1996):
• Supervised classifier
• Assembling classifiers
– Combine many low-accuracy classifiers (weak
learners) to create a high-accuracy classifier (strong
learners)
4
Example 1
5
6
7
8
9
10
11
12
13
14
15
16
• Strong classifier = linear combination of T
weak classifiers
(1) Design of weak classifier ht (x)
(2) Weight for each classifier (Hypothesis
weight)  t
(3) Update weight for each data (example
distribution)
• Weak Classifier: < 50% error over any distribution
17
18
19
20
21
22
23
• Strong classifier = linear combination of T
weak classifiers
(1) Design of weak classifier ht (x)
(2) Weight for each classifier (Hypothesis
weight)  t
(3) Update weight for each data (example
distribution)
• Weak Classifier: < 50% error over any distribution
24
Adaboost: Design of weak classifier (1/2)
25
Adaboost: Design of weak classifier (2/2)
• Select a weak classifier with the smallest weighted
error
m
ht  arg min  j  i 1 Dt (i )[ yi  h j ( xi )]
h j H
1
• Prerequisite:  t 
2
26
• Strong classifier = linear combination of T
weak classifiers
(1) Design of weak classifier ht (x)
(2) Weight for each classifier (Hypothesis
weight)  t
(3) Update weight for each data (example
distribution)
• Weak Classifier: < 50% error over any distribution
27
• How to set t ?
1 , if yi  H(xi )
i  0, else

1 , if yi f(xi )  0
1
 
0, else
N i 
1
1 exp( yi f(xi ))
  exp( yi f(xi ))  DT 1 (i) 
N
N i
 Zt
1
training error( H ) 
N
  DT 1 (i ) Z t
i
  Zt
i
i
i
28
1 1 t
 t  ln(
)
2
t
29
• Strong classifier = linear combination of T
weak classifiers
(1) Design of weak classifier ht (x)
(2) Weight for each classifier (Hypothesis
weight)  t
(3) Update weight for each data (example
distribution)
• Weak Classifier: < 50% error over any distribution
30
(Reweighting)
y * h(x) = 1
y * h(x) = -1
31
Reweighting
In this way, AdaBoost “focused on” the
informative or “difficult” examples.
32
Reweighting
In this way, AdaBoost “focused on” the informative or
“difficult” examples.
33
34
Summary
t=1
1 1 t
 t  ln(
)
2
t
35
Example 2
36
Example (1/5)
Original Training set : Equal Weights to all training samples
Taken from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire
Example (2/5)
ROUND 1
Example (3/5)
ROUND 2
Example (4/5)
ROUND 3
Example (5/5)
Example 3
42
1
2
1 t
 t  ln(
t
)
43
1
2
1 t
 t  ln(
t
)
44
1
2
1 t
 t  ln(
t
)
45
1
2
1 t
 t  ln(
t
)
46
1
2
1 t
 t  ln(
t
)
47
1
2
1 t
 t  ln(
t
)
48
49
Example 4
50
51
52
53
54
Application
55
Discussion
56
(Friedman’s wording)
57
(Freund and Schapire’s wording)
58
Predictions (RealAB)
59
• LogitBoost
60
• GentleBoost
61
Reference
62
63
Robust Real-time Object Detection
Key word : Features extraction,
64
Outline
1. Introduction
2. Features
2.1 Features Extraction
2.2 Integral Image
3.1 Training Process
3.2 Testing Process
5. Experimental Results
6. Conclusion
7. Reference
65
1. Introduction
This paper brings together new algorithms and
insights to construct a framework for robust and
extremely rapid object detection.
Frontal face detection system achieves :
1. High detection rates
2. Low false positive rates
Three main contributions:
1. Integral image
2. AdaBoost : Selecting a small number of important features.
66
2. Features
 Based on the simple features value.
 Reason :
1. Knowledge-based system is difficult to learn using a finite
quantity of training data.
2. Much faster than Image-based system.
ps. Feature-based: Use extraction features like eye, nose pattern.
Knowledge-based: Use rules of facial feature.
Image-based: Use face segments and predefined face pattern.
[3] A.S.S.Mohamed, Ying Weng, S. S Ipson, and Jianmin Jiang, ”Face Detection based on Skin Color in Image by Neural
67
Networks”, ICIAS 2007, pp. 779-783, 2007.
2.1 Feature Extraction (1/2)
 Filter:
Ex: haar-like filter
Filter type.
 Feature:
a. pattern 的座標位置.
EX: eye , nose
b. pattern 的大小
 Feature value:
Feature value = Filter  Feature
EX:
convolution
68
2.1 Feature Extraction (2/2)
• Haar-like filter: The sum of the pixels which lie within the
white rectangles are subtracted from the sum of pixels in
the grey rectangles.
Figure 1
Filter C
Feature

24
Filter type
＋ －＋
feature value
69
24
2.2 Integral Image (1/6)
 Integral image
1. Rectangle features
2. Computed very rapidly
 II(x , y) : sum of the pixels above and to the left of (x , y).
ii( x, y) 

i( x' , y ' )
x'  x , y '  y
70
2.2 Integral Image (2/6)
Known:
A: Sum of the pixels within rectangle A.
B: Sum of the pixels within rectangle B.
C: Sum of the pixels within rectangle C.
D: Sum of the pixels within rectangle D.
Location 1 value is A. Location 2 value is A+B.
Location 3 value is A+C.
Location 4 value is A+B+C+D.
Equation: 1 = A
3=A+C
2 = A + B.
4=A+B+C+D
Figure 2: Integral image
Q : The sum of the pixels within rectangle D = ?
A : 4  A B C  D  D  4 A B C
 D  4  ( A  B)  C  D  4  2  C
 D  4  2  (3  A)  D  4  2  (3  1)  4  1  (2  3)
The sum within D can be computed as 4 + 171- (2 + 3).
2.2 Integral Image (3/6)
Sum of the pixels
72
2.2 Integral Image (4/6)
 Using the following pair of recurrences to get integral image
s( x, y )  s( x, y  1)  i( x, y )
ii( x, y )  ii( x  1, y )  s ( x, y )
ii ( x, y )
i( x, y)
is the integral image.
(1)
(2)
ii(1, y)  0
is the original image.
s( x, y ) is the cumulative row sum.
ps.
ii ( x, y ):
s( x, 1)  0
ii( x, y)    i( x' , y ' )
x'  x y '  y
73
2.2 Integral Image (5/6)
i( x, y)
original image
s ( x, y )
ii ( x, y )
cumulative row sum
1
3
4
7
2
4
74
6
10
6 13 18
8
s( x, y )  s( x, y  1)  i( x, y )
ii( x, y )  ii( x  1, y )  s( x, y )
integral image
(1)
(2)
2.2 Integral Image (6/6)
i( x, y)
s ( x, y )
original image
cumulative row sum
(3*3)
s( x, y ) = s( x, y  1) +
s(0, 0) = s(0, 1) +
s(1,0) = s (1, 1) +
s(2, 0) = s(2, 1) +
s(0,1) = s(0, 0)
+
s (1,1) = s(1,0)
+
ii ( x, y )
i( x, y)
i(0,0)
i (1, 0)
i(2, 0)
i (0,1)
i (1,1)
integral image
(3*3)
=0+1=1
=0+1=1
=0+4=4
=1+2=3
=1+2=3
(3*3)
ii ( x, y ) = ii ( x  1, y ) + s ( x, y )
ii(0, 0) = ii(1,0) + s(0, 0) =0+1=1
ii(1, 0)
ii (2, 0)
ii(0,1)
ii(1,1)
= ii (0, 0)
+ s(1,0) =1+1=2
= ii (1, 0)
+ s (2, 0) =2+4=6
= ii(1,1)
+ s(0,1) =0+3=3
= ii (0,1)
+ s (1,1) =3+3=6
+ i(2, 2) =7+1=8
…
…
s(2, 2) = s(2,1)
ii (2, 2) = ii(1, 2)
+ s (2, 2) =10+8=18
75
 AdaBoost works by choosing and combining weak classifiers
together to form a more accurate strong classifier !
– Weak classifier:
Feature value
positive , if filter(X )<
h( X )  
negative , otherwise
Image set
76
Threshold
 Subsequent classifiers built are tweaked in favor of those
instances misclassified by previous classifiers. [4]
 The goal is to minimize the number of features that need to
be computed when given a new image, while still achieving
high identification rates.
77
3.1 Training Process - Flowchart
1. Input:
Training image set X
Face
(24x24)
l張
Non-Face
(24x24)
m張
2. Feature Extraction:
Using haar-like filter
feature

extract 出 N 個 feature value

Candidate threshold θ
3.0. Image weight initialization
1
, for postive

 2l
wi  
, i  1,..., n
 1 , for negative

 2m
3.1. Normalize image weight
wt ,i 
wt ,i
n
 wt , j
, i  1,..., n
j 1
3.2. Error calculation
n
 i , j   wt ,k hi , j  xk   yk , i  1,...n, j  1,...N
k 1
3.3. Select a weak classifier ht with the lowest error εt
4. Output:
A strong classifier
(24x24)

 wt ,i t , if xi is classified correctly
wt 1,i  

 wt ,i , otherwise
T
HT ( x)    t ht ( x)
t 1
Weak classifier
Weak classifier weight
T個
weak classifiers
3.2 Training Process - Input
1. Input:
Training data set 以 X 表示 . 設有 l 張 positive image , m 張 negative
image , 共 n (n=l+m) 張 image.
1 , postive/face
X  ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) , yi  
0 , negative/non-face
{
…
1
1
0
1
0
}
1

f j ( xi ) 表示 image

79
xi
3.2 Training Process - Feature Extraction (1/2)
2. Feature Extraction: Using haar-like filter
Haar-like filter :
n 張 image
…
…

…
…
…
…
…
f: feature number
convolution
Candidate feature value
(n*N 個)
Ps. 1張 image 可 extract 出 N 個 feature value
∴N=4*f
80
3.2 Training Process - Feature Extraction (2/2)
Define weak classifiers
h: i , j
Ex : 3 face & 1 non-face image
extract by 5th feature
Face
1 , if pi , j f j ( xk )  pi , ji , j
hi , j ( xk )  
0 , otherwise
Non-face
i , j
: 即 image i 的第 j 個 local feature value
f j ( xk ) : 即 image k 的第 j 個 local feature value
Polarity :
1 , if  i , j  0.5
pi , j  
1 , otherwise
θ1,5 θ2,5 θ3,5 θ4,5
h1,5 h2,5 h3,5 h4,5
ε1,5 ε2,5 ε3,5 ε4,5
81
3.2 Training Process - AdaBoost Algorithm (1/4)
3-0. Image weight initialization :
1
, for postive/face image

 2l
wi  
, i  1,..., n
1

, for negative/non-face image

 2m
l is the quantity of positive images.
m is the quantity of negative images.
82
3.2 Training Process - AdaBoost Algorithm (2/4)
Iterative: t = 1, … ,T
T : weak classifier number
3-1. Normalize image weight:
wt ,i
wt ,i  n
, i  1,..., n
 wt , j
Training data set X
j 1
3-2. Error calculation :
Candidate weak classifier
n
error rate
 i , j   wt ,k hi , j  xk   yk , i  1,...n, j  1,...N
k 1
3-3. Select a weak classifier
positive or negative
with
ht the lowest error rate .  t
wt 1,i


 wt ,i t , if xi is classified correctly

where  t = t
1- t

 wt ,i , otherwise
83
3.2 Training Process – Output (1/2)
T
1 T

positive/face , t ht ( x)  t
HT ( x)  
2 t 1
t 1
negative/non-face , otherwise

1
threshold
t  log
t
Weak classifier weight
84
3.2 Training Process – Output (2/2)
Weak classifier weight (  t  1   t )
t
ε
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
250 α
200
Fig. B
Weak classifier weight ( t
1 t
 log(
)
t
ε

~ 0.1 區間內與其他如 0.1 ~
0.5 區間內即使ε有相同的變

classifier weight)變化量差異也

log 是為了縮小 weight 彼此

)
ε
0.6
0.5
0.001
0.4
0.3
0.2
0.1
0
0
Fig. C
0.5
1
1.5
2
2.5
3
3.5
4
4.5
α
1 

999
0.005
199
0.101
8.9
0.105
8.52
1 
log(
)

2.99
800
0.7
2.29
0.94
0.38
0.93
0.01
If
1,3
1,3
0.167


 0.2
1  1,3 1  0.167
t =1 時
W1
W2
W3
W4
W5
W6

0.167
0.167
0.167
0.167
0.167
0.167

O
O
O
O
X
O
0.167
0.167*0.2
0.5
0.1
Update wt 1 0.167*0.2 0.167*0.2 0.167*0.2 0.167*0.2
Normalize
0.1
0.1
0.1
0.1
Weight 變化
wt 1,i
 wt ,i t , if xi is classified correctly

 wt ,i , otherwise

Normalize 後，分錯的 image的 weight 會相對提高，

3.3 Testing Process - Flowchart
Test Image
1. Extract Sub-windows
Downsampling
…
(360*420)
(24*28)
(228*336)
(360*420)
sub-windows
(24*24)
…

2. Strong Classifier (24*24)Detection
Strong Classifier
…
h1
h2
h3
Sub-window

hT
T
1 T

positive
,

h
(
x
)



t
t t
H T ( x)  
2 t 1
t 1
negative , otherwise

Accept windows
Result Image
…
3. Merge Result
average
coordinate
…
For all
sub-windows
Reject windows
…
 Advantage: Reducing testing computation time.
 Idea: Reject as many negatives as possible at the earliest stage. More complex
classifiers were then used in later stages.
 The detection process is that of a degenerate decision tree, so called “cascade”.
88
Stage
True positive
False positive
True negative
False negative
θ
Stage 1
θ
Stage 2
θ
89
Stage 3
 True positive rates (detection rates):

True Positive
True Positive  False Negative
 False positive rates (FP):

False Positive
False Positive  True Negative
 True negative rates:

True Negative
False Positive  True Negative
 False negative rates (FN):

FP
False Negative
True Positive  False Negative
90
+
FN
=> Error Rate
 Training a cascade of classifiers:
 Involves two types of tradeoffs :
1. Higher detection rates
2. Lower false positive rates
 More features will achieve higher detection rates and
lower false positive rates. But classifiers require more time
to compute.
 Define an optimization framework:
1. the number of stages
2. the number of features in each stage
3. the strong classifier threshold in each stage
91
4. The Attentional Cascade - Algorithm (1/3)
4. The Attentional Cascade - Algorithm (2/3)
 f : Maximum acceptable false positive rate. (最大 negative 辨識成 positive 錯誤百分比)
 d : Minimum acceptable detection rate. (最小辨識出 positive 的百分比)
 Ft arg et: Target overall false positive rate. (最後可容許的 false positive rate)
 Initial value:
P : Total positive images
N : Total negative images
f = 0.5
d = 0.9999
Ft arg et  106
F0  1.0

D0  1.0

Threshold = 0.5
Threshold_EPS =
i=0
104
93
4. The Attentional Cascade - Algorithm (3/3)
 Iterative:
While( Fi  Ft arg et )
{
i=i+1
f : Maximum acceptable false positive rate
d : Minimum acceptable detection rate
Ft arg et : Target overall false positive rate
ni  0, Fi  Fi 1
P : Total positive images
N : Total negative images
i : The number of cascade stage
While( Fi  f * Fi )1
Fi : False positive rate at ith stage
{
ni : The number of features at ith stage
Di : Detection rate at ith stage
ni  ni  1
( Fi , D)i = AdaBoost(P,N, ) ni
Get New Di , Fi
While( Di  d * Di )1
{
Threshold = Threshold – Threshold_EPS
Di = Re-computer current strong classifier
detection rate with Threshold (this also affects
)
}
Threshold , 則Di ,Fi
}
If( Fi  Ft arg et)
N = false detections with current cascaded detector on the N
}
N = Fi *N
Fi
5. Experimental Results (1/3)
 Face training set:
 Extracted from the world wide web.
 Use face and non-face training images.
 Consisted of 4916 hand labeled faces.
 Scaled and aligned to base resolution of 24 by 24 pixels.
 The non-face sub-windows come from 9544 images which were
manually inspected and found to not contain any faces.
Fig. 5: Example of frontal upright face
images used for training
95
5. Experimental Results (2/3)
 Use 4916 training faces.
 Use 10,000 non-face sub-windows.
 Use the AdaBoost training procedure.
 Evaluated on the MIT+CMU test set:
 An average of 10 features out of a stage are evaluated per sub-window.
 This is possible because a large majority of sub-windows are rejected by
the first or second stage in the cascade.
 On a 700 Mhz Pentium III processor, the face detector can process a 384
by 288 pixel image in about .067 seconds .
96
5. Experimental Results (3/3)
θ
Fig. 6: Create the ROC curve (receiver operating characteristic) the threshold of
the final stage classifier is adjusted from
.
 to 
97
Reference
[1] P. Viola and M. Jones, “Rapid Object Detection Using A Boosted Cascade of
Simple Features”, Proc. IEEE Conf. Computer Vision and Pattern
Recognition, vol.1, pp. 511-518, 2001
[2] P. Viola and M. Jones, “Robust Real-time Object Detection”, IEEE
International Journal of Computer Vision, vol.57, no.2, pp.137-154, 2001.
[3] A.S.S.Mohamed, Ying Weng, S. S Ipson, and Jianmin Jiang, ”Face Detection
based on Skin Color in Image by Neural Networks”, ICIAS 2007, pp. 779783, 2007.
[4] AdaBoost - Wikipedia, the free encyclopedia ,