2/13/2012 12 Structured Light

Report
How Kinect works?
Po-Hsiang Chen
Advisor: Sheng-Jyh Wang
2/13/2012
Major References
•
Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time
Human Pose Recognition in Parts from Single Depth
Images." Microsoft Research Cambridge & Xbox
Incubation
•
•
CVPR 2011 Best Paper
Freedman, B., A. Shpunt, et al. (2008). Depth mapping
using projected patterns, US 2010/0118123A1
•
PrimeSense Patent
2
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
3
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
4
2/13/2012
What is Kinect?
•
•
Motion sensing input device by Microsoft
Depth camera tech. developed by PrimeSense
• Invented in 2005
• Software tech. developed by Rare
• First announced at E3
2009 as “Project Natal”
• Windows SDK Releases
http://www.microsoft.com
/en-us/kinectforwindows/
discover/features.aspx
5
2/13/2012
Kinect IR Structured Light
6
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
7
2/13/2012
Kinect Architecture
Depth
Image
IR Structured Light
Body
Parts
Random Decision Forest
8
Joint
Position
Mean Shift
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
9
2/13/2012
3D Imaging of surface
10
2/13/2012
Triangulation
•
Main Problem
•
To recover shape from multiple views, need
CORRESPONDENCES between the images
•
Matching/Correspondence problem is hard
•
•
Occlusions, Texture, Colors.. Etc.
Solution: Structured light
•
•
Idea: Simplify matching
Strategy: Use illumination to create your
own correspondences
11
2/13/2012
Structured Light
•
Basic Principle
•
•
Use a projector to create unambiguous
correspondences
Light projection
•
If we project a single point, matching is unique
12
2/13/2012
Structured Light
•
Line projection ( Line Scan )
•
For calibrated cameras, the epipolar geometry is
known
•
Project a line instead of a single point
13
2/13/2012
Structured Light
•
Project Multiple Stripes or Grids
•
Which stripe matches which?
•
Correspondence Again
14
2/13/2012
Structured Light
•
Answer 1: Assume Surface Continuity
•
Ordering Constraint
15
2/13/2012
Structured Light
•
Answer 2: Coloured stripes (De Bruijn)
•
Difficult to use for coloured surfaces
16
2/13/2012
Structured Light
•
Answer 2: Coloured dots (M-array)
•
Difficult to use for coloured surfaces
17
2/13/2012
Structured Light
•
Answer 3: Pattern dots (M-array)
•
Difficult for industrial manufacturing
18
2/13/2012
Structured Light
•
Answer 4: Time-coded light patterns (Time multiplexing)
•
•
Use a sequence of binary patterns → (log N) images
Each stripe has a unique binary illumination code
19
2/13/2012
Structured Light
•
•
All of the above are categorized as Discrete Methods
•
Salvi, J., S. Fernandez, et al. (2010). "A state of the art
in structured light patterns for surface profilometry."
Pattern Recognition 43(8): 2666-2680
There are a lot more Continuous Structured Light
Methods such as Phase shifting and etc.
20
2/13/2012
Structured Light
•
•
All of the above are human designed patterns.
Random Speckle
•
•
Structured light using randomly generated patterns
May obtain denser depth information by solving
correspondence problem
21
2/13/2012
What can we do better?
•
•
A Projector is just an inverse of a camera
•
Need Calibration
One projector and one camera is enough for
triangulation
22
2/13/2012
PrimeSense Patents
•
US 2010/0118123
•
•
Projector-Camera system
Already calibrated structure
•
δZ results in δX in 32
23
2/13/2012
PrimeSense Patents
•
US 2010/0118123
•
Structured Light-1
•
•
•
•
Pseudo-random distribution
Local: Random
Global: Gray level decreases
Can make a rough estimate in
a low resolution image
24
2/13/2012
PrimeSense Patents
•
US 2010/0118123
•
Structured Light-2
•
•
•
Quasi-periodic pattern
Five-fold symmetry
Results in distinct peaks
in freq. domain
•
Contain no unit cell repeats
over spatial domain
•
Use to reduce noise and
ambient light in environment
25
2/13/2012
Kinect IR Structured Light
26
2/13/2012
PrimeSense Patents
•
US 2010/0290698
27
2/13/2012
PrimeSense Patents
•
US 2010/0290698
•
Uses a special (“astigmatic”) lens with different focal
length in x- and y- directions
•
Orientation of the circle indicates depth
28
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
29
2/13/2012
From depth to joints
•
Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time
Human Pose Recognition in Parts from Single Depth
Images." Microsoft Research Cambridge & Xbox
Incubation
•
Treat body segmentation as a per-pixel classification
task ( No pairwise term or CRF is used )
•
•
Algorithms runs 5ms per frame on Xbox GPU
Novelty: Intermediate body parts representation
30
2/13/2012
Body Part Inference
•
Body part labeling
•
•
31 body parts
Distinct parts for left and right allow classifier to
disambiguate the left and right sides of the body
31
2/13/2012
Body Part Inference
•
Depth image features
•
•
•
dI(x) is the depth at pixel x in image I
θ=(u,v) describe offsets u and v
Each feature need only read at most 3 image pixels
and perform at most 5 arithmetic operations
32
2/13/2012
Randomized Decision Forests
•
•
•
•
Fast and effective multi-class classifier
Each split node consists of a feature fθ and a threshold τ
At the leaf node in tree t, given a learned
Final classification
33
2/13/2012
Combining Models
•
Multiple classifiers work together
•
Committees
•
•
•
E.g. Majority votes
Boosting
•
•
•
E.g. Averaging the predictions of a set of individual models
Classifiers trained in sequence
E.g. AdaBoost
Decision Tree
•
Binary selection corresponding
to the traversal of a tree
34
2/13/2012
Decision Tree
•
Three major aspect
•
•
•
A splitting criterion
A stop-splitting rule
A rule to assign each
leaf to a specific class
•
Decision Forests
•
A Decision Tree Committee
35
2/13/2012
Randomized Decision Forests
•
•
•
•
Fast and effective multi-class classifier
Each split node consists of a feature fθ and a threshold τ
At the leaf node in tree t, given a learned
Final classification
How to train?
36
2/13/2012
Randomized Decision Forests
•
Training
•
•
•
Each tree train on different images
Each image pick 2000 example pixels
Algorithm
37
2/13/2012
Randomized Decision Forests
•
Algorithm(cont.)
•
Shannon entropy given Z on Y
38
2/13/2012
Randomized Decision Forests
•
Algorithm(cont.)
•
Training takes a lot of efforts
•
3 trees with depth 20 from 1 million images takes
about a day on a 1000 core cluster
Where are those training data?
39
2/13/2012
Training Data
•
Depth imaging
•
•
Simplify the task of background subtraction
Most important: easy to synthesize!!!
Take Real
Images
Learning
Synthesize
Parameters
40
Generate
Lots of
training data
2/13/2012
Kinect Architecture
Depth
Image
IR Structured Light
Body
Parts
Random Decision Forest
41
Joint
Position
Mean Shift
2/13/2012
Joint Position Proposals
•
From the previous section,
•
Use Mean Shift with a weighted Gaussian kernel
42
2/13/2012
Mean Shift
•
Kernel density estimator
•
•
•
Discrete points -> Continuous function
Calculate the gradient at initial point and shift
Iterate till stop
43
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
44
2/13/2012
Experiments and Results
•
Synthetic
•
Real
45
2/13/2012
Experiments and Results
•
Failure
46
2/13/2012
Experiments and Results
•
Training parameters vs. classification accuracy
47
2/13/2012
Experiments and Results
•
Comparisons
48
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
49
2/13/2012
Conclusion
•
Depth images may contain enough information to solve
human pose problems
•
Depth images are color and texture invariant, which
simplifies a lot of the corresponding problem
•
A deep combining model with sufficient training data
can become a good classifier even with simple features
•
Buy a Kinect for LAB
50
2/13/2012
Outline
•
•
•
What is Kinect?
Kinect Architecture
From IR to depth image
•
•
•
PrimeSense Invented Structured Light
From depth image to joint positions
•
•
•
•
•
History of Structured Light
Body Part Interference
Joint Proposals
Experiments and Results
Conclusion
References
51
2/13/2012
References
•
Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time
Human Pose Recognition in Parts from Single Depth
Images." Microsoft Research Cambridge & Xbox
Incubation
•
Freedman, B., A. Shpunt, et al. (2008). Depth mapping
using projected patterns, US 2010/0118123A1
•
Freedman, B., A. Shpunt, et al. (2008). Distance-Varying
Illumination and Imaging Techniques for Depth Mapping,
US 2010/0290698A1
52
2/13/2012
References
•
•
•
•
•
Salvi, J., S. Fernandez, et al. (2010). "A state of the art
in structured light patterns for surface profilometry."
Pattern Recognition 43(8): 2666-2680.
Albitar, I., P. Graebling, et al. (2007). “Robust
structured light coding for 3D reconstruction,” IEEE.
Scharstein, D. and R. Szeliski (2003). “High-accuracy
stereo depth maps using structured light,” IEEE.
Breiman, L. (2001). "Random forests." Machine learning
45(1): 5-32.
Amit, Y. and D. Geman (1997). "Shape quantization and
recognition with randomized trees." Neural computation
9(7): 1545-1588.
53
2/13/2012
References
•
•
•
•
•
•
•
•
John MacCormick, “How does the Kinect work? ”
users.dickinson.edu/~jmac/selected-talks/kinect.pdf
“Structured Light”,
www.igp.ethz.ch/photogrammetry/.../MV-SS2011structured.pdf
http://en.wikipedia.org/wiki/Kinect
http://en.wikipedia.org/wiki/Structured-light_3D_scanner
http://en.wikipedia.org/wiki/Triangulation
http://dms.irb.hr/tutorial/tut_dtrees.php
http://www.anandtech.com/show/4057/microsoft-kinectthe-anandtech-review/2
Chen, Y. S. and B. T. Chen (2003). "Measuring of a threedimensional surface by use of a spatial distance
computation." Applied optics 42(11): 1958-1972.
54
2/13/2012

similar documents