Човеко Машинен Интерфейс с KINECT

Report
Random Forest and Graph Cut
based segmentation of human
limbs
Nadezhda Zlateva, IICT-BAS
7 Sept. 2011
Outline
• Human Pose Recognition
• Case Study
• Randomized Decision Tree
• Random Forest
• Experimental results with RF
• Graph Cut
• Experimental results with GC
• Application to hand classification
• Conclusion
• References
2
Human Pose Recognition
Recognition via
 conventional intensity cameras
 depth cameras
Frame to frame points tracking – slow to re-initialize
Pose Recognition in parts:
• Body parts segmentation
- Per pixel classification
•
3D skeletal joints estimation
[1] Shotton et al., 11
3
Case Study
Upper limbs segmentation for hand gesture recognition
Application:
• Sign language interpretation
• Medical environments
-Robots medical assistants
[Purdue University]
-CT & MRI review in sterile
environments
[Sunnybrook Hospital, Toronto]
4
Binary Decision Tree: Basics
5
leaf nodes
split nodes
v
1
≥
2
3
<
4
5
6
7
<
8
10
9
14
15
16
≥
11
12
17
category c
13
DT over depth images: Training
feature vector – pixel x [x, y, z]T of depth image I
split function – depth comparison features fθ as function of x:
dI(x) – depth at pixel x
[1] Shotton, 11
θ1
θ2
Combination of weak
but computationally
efficient features
6
Randomized DT: Training
7
1. Random selection of a set of split candidates ϕ = (θ, τ), where
- set of split thresholds for each θ for tree
t.
2. Definition of the set of training pixels Q={(I,x)} over all training
images for the tree t. Q - set of pixels at the root node.
3. Find best split candidate
at node n – largest
information gain from splitting Q into Qleft & Qright
Randomized DT: Training
4. Recurse for Qleft(ϕ*) & Qright(ϕ*)– till reaching stop conditions
- Maximum depth
- Minimum information gain
- Minimum number of node pixels
5. Estimation of Pt(c|I,x) at each leaf node over body part labels
c – use normalized histogram
Note:
• dependent on choice of parameters
• prone to over-fitting
8
Random Forest
9
Forest - ensemble of T decision trees
• Divide training (depth) images into T subsets – unique subset
for each tree t
• Train each tree
[3] Breiman 01
[1] Shotton et al. 11
Random Forest: Classification
x
x
tree t1
……
label c
label c
• classification is
tree tT
10
Random Forest: Toy demo
11
[2] Shotton et al. 09
Random Forest: Summary
•
•
•
•
•
•
Improves generalization to new data
Ensemble of trees gives robustness
Good for multi-class problems
Resistant to over-fitting
Fast training on large data sets
Efficient classifier
12
RF: Experiments and results
-
13
Ground truth: 500 (upper limb) labeled depth images (640x480)
Number of trees: T=3
Tree depth: 15
Split candidates: |θ|=100, |τ|=20 for each θ
Random pixels per image: 1000
5-fold cross validation => 100 test images, 130 training images
per tree Table 1. Average per class accuracy with RF classification
RF: Experiments and results
Ground truth & training
Per pixel classification
14
Segmentation by Graph Cut: Motivation15
RF classification results:
• Fuzzy body part boundaries
• Left/Right uncertainty
Subsequent hand sign recognition – requires cleaner hand region
segmentation
Graph Cut framework:
• Energy minimization framework
• Binary and multi-label image segmentation
• Combines local and contextual information
Pixel labeling problem
Given
Pixels
Assignment cost – U (unary potential)
Separation cost – B (boundary potential)
- pairs of neighboring pixels
Find
Labels
that minimize
[4] Boykov et al. 01
16
Graph Cut: Binary case
17
• Image as directed graph G(V, E)
t-link
Assignment cost
n-link
Separation cost
Energy minimization problem = min s-t cut on G = max-flow
Theorem:
In a graph G, the maximum source-to-sink flow possible
is equal to the capacity of the minimum cut in G.
[L. R. Foulds, Graph Theory Applications, 1992 Springer-Verlag New York Inc., 247-248]
Graph Cut: Multi-label case
Energy = cut cost || C ||

 |w
ij
eC
Suboptimal approximation
of the minimum energy
|
18
Graph Cut: Potentials
19
Importance
weight
Energy function
prob. by RF
Unary potential
,
Boundary potential
prior
constraints
,
[5] Boykov et al. 06
Graph Cut: Results
Spatial Coherence:
20
Graph Cut: Results
RF classifications
GC segmentation
21
RF & GC for hands
Ground truth
Random
Forest
Graph Cut
22
63 frames
500 random pixels
|Omax| = 45
58.5%
per class accuracy
70.9%
per class accuracy
Conclusion
• RF – strong classifier
• RF + GC over depth maps – good object segmentation
•
•
•
•
Future Work
Increase available data
Improve pixel label inference
Estimate upper limb/hand joints
Recognize finger configuration
23
References
[1] Shotton, J., A. FItzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake.
Real-time Human Pose Recognition in Parts from a Single Depth Image. CVPR, 2011
[2] Shotton, J. Boosting and Random Forest for Visual Recogniion, ICCV Tutorial, 2009.
http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
[3] Breiman, L. Random forests. Mach. Learning, 45(1):5–32, 2001.
http://www.stat.berkeley.edu/~breiman/RandomForests
[4] Boykov, Y., and M. P. Jolly. Interactive graph cuts for optimal boundary and region
segmentation of objects in N-D images. In Proc. IEEE Int. Conf. on Computer Vision, 2001.
[5] Boykov, Y., and G. Funka-Lea. Graph cuts and efficient n-d image segmentation. IJCV,
70:109–131, 2006

similar documents