### Geometry Slides (part 4) - Weizmann Institute of Science

```Geometry 4:
Multiview Stereo
Introduction to Computer Vision
Ronen Basri
Weizmann Institute of Science
Material covered
• Pinhole camera model, perspective projection
• Two view geometry, general case:
• Epipolar geometry, the essential matrix
• Camera calibration, the fundamental matrix
• Two view geometry, degenerate cases
• Homography (planes, camera rotation)
• A taste of projective geometry
• Stereo vision: 3D reconstruction from two views
• Multi-view geometry, reconstruction through
factorization
Structure from motion
• Input:
• a set of point tracks
• Output:
• 3D location of each point (shape)
• camera parameters (motion)
• Assumptions:
• Rigid motion
• Orthographic projection (no scale)
• Method: SVD factorization (Tomasi & Kanade)
Setup
• 1 , 2 , … ,  : a collection of images (video frames)
depicting a rigid scene
•  point tracks in those  frames:  = ( ,  )
the location of  at frame
• Unknown 3D locations:
= ( ,  ,  ) ∈ ℝ3 ,  = 1, … ,
• Therefore,
=    +
=    +
,   are the two top rows of a 3 × 3 rotation
matrix
Objective
Find   ∈ ℝ3 and  ,  ∈ ℝ that minimize

(  +  ) −
2

+ (  +  ) −
=1 =1
Subject to
=  = 1
= 0
2
Eliminate translation
• We can eliminate translation by representing the
location of each point relative to the centroids of all
points
• Assume without loss of generality that the centroid
of 1 , … ,  coincides with the origin  ∈ ℝ3
• Translate each image point by setting
=  −
=  −
( ,  ) denotes the centroid of ( ,  )
Objective (no translation)
Find   ∈ ℝ3 that minimize

−
2

+   −
=1 =1
Subject to
=  = 1
= 0
2
Measurement matrix
=
11
…
1
11
..
1
12
.
.
.
2
12
.
.
.
.
.
.
2
.
.
.
1
…

1
…

2×
Transformation and shape matrices
=
1
…

1
…

1
= 1
1
=
2
2
2
11
…
1
11
…
1
.
12
13
…
3
13
…
3
2
12
2
.
.

2×3
3×
Objective: matrix notation
Find  and  that minimize
−

Subject to
=  = 1
= 0
is 2 × ,  is 2 × 3,  is 3 ×
=  + Noise
11 12
…
1 2
11 12
..
1 2
11 12
…
1 2
=
11 12
…
1 2
.
.
.
.
.
.
.
.
.
.
.
.
13
…
3
13
…
3
1
…

1
…

1
1
1
2×3
2×
…
…

+ Noise
3×
TK-Factorization
=  + Noise
Step 1: find rank 3 approximation to  using SVD
= Σ
where
is 2 × 2,    = ,
Σ = diag(1 , 2 , … ), size 2 × , and
1 ≥ 2 ≥ ⋯ ≥ 0
is  × ,    =
TK-Factorization
= Σ3
where Σ3 = diag(1 , 2 , 3 , 0, 0, … )
Note: this is a relaxation, only noise components
outside the 3D space are annihilated
Step 2: factorization
=  Σ3
Ambiguity:
=
Σ3
= ()(−1 )
for any non-singular, 3 × 3 matrix
TK-Factorization
Step 3: resolve ambiguity
=  = 1
= 0

Let  =

Let  =

, note that    =
2×3
be the corresponding rows in , then
2×3
=
Find a 3 × 3 symmetric matrix

=    =
TK-Factorization

=    =
• Equation is linear in
• There are 3 equations in 6 unknowns
• Find  by eigen-decomposition
= ∆
so that
= ∆
• Solution is obtained up to a rotation ambiguity

()(  )
such that  =

TK-Factorization: Summary
1. Eliminate translation, construct
2. () to get rank 3  and factorize
=  (3 × 3 ambiguity  remains)
3. Resolve ambiguity: estimate  by
exploiting orthonormality of each rotation,
then factorize to obtain
Final solution up to rotation and reflection
TK-Factorization: pros and cons
• Breaks a difficult, non-linear optimization into
simple optimization steps
• Works well with errors
• Orthographic projection
• Requires complete tracks
Factorization with incomplete tracks
• Need a way to approximate by a low rank
matrix with missing data
min  ⊙ ( − )
rank  =3
a mask,  = 1 wherever  is known
• This problem is NP-hard
• Surrogate: minimize the nuclear norm – sum
of singular values, 1 + 2 + 3 + ⋯
• Nuclear norm is convex, minimization often
achieves low rank
• Better iterative procedures exist
Perspective multiview stereo
• A point  = (, , ) is projected to

=
=

• A point rotated by  and translated by
projects to
(2   +  )
(1   +  )
=
=

3  +
3   +
denotes the rows of
• Given  points in  frames ( ,  ) find camera
matrices  and positions  that minimize

=1 =1

(1  +  )
−

3  +
2

(2  +  )
+
−

3  +
• Alternate optimization
• Given  and  , solve for
• Given  solve for  and
• Very good initial guess is required
2
Bundler (Photo Tourism)
(Snavely et al.)
Bundler (Photo Tourism)
• Given images, identify feature points,
describe them with SIFTs
• Match SIFTs, accept each match  ↔
whose score is at least twice of any other
match  ↔
• For every pair of images with sufficiently
many matches use RANSAC to recover
Essential matrices
• Starting with two images and adding one
image at a time: use essential matrix to
recover depth and apply bundle adjustment
Simultaneous solutions
•  =   : Essential matrix between  and  ,
×
,  = 1, … , , available on a subset of image pairs
• Objective: recover camera orientation  and
location  relative to a global coordinate system
• First step: recover rotations:

min  −

• This can be solved in various ways, for example
min   −  : least squares solution if we

ignore the orthonormality constraints for
Epipolar relation in global coordinates
• The epipolar line relation,    = 0 can be
written in a global coordinate system as follows
× −
= 0
×
• This generalizes the formula for the essential matrix
(plug in  = ,  = )
• Once camera orientations  are known we can
solve for camera locations (equation is linear and
homogeneous in the translation components)
• Solution suffers from shrinkage problems
Multiview reconsruction
```