Report

Geometry 4: Multiview Stereo Introduction to Computer Vision Ronen Basri Weizmann Institute of Science Material covered • Pinhole camera model, perspective projection • Two view geometry, general case: • Epipolar geometry, the essential matrix • Camera calibration, the fundamental matrix • Two view geometry, degenerate cases • Homography (planes, camera rotation) • A taste of projective geometry • Stereo vision: 3D reconstruction from two views • Multi-view geometry, reconstruction through factorization Structure from motion • Input: • a set of point tracks • Output: • 3D location of each point (shape) • camera parameters (motion) • Assumptions: • Rigid motion • Orthographic projection (no scale) • Method: SVD factorization (Tomasi & Kanade) Setup • 1 , 2 , … , : a collection of images (video frames) depicting a rigid scene • point tracks in those frames: = ( , ) the location of at frame • Unknown 3D locations: = ( , , ) ∈ ℝ3 , = 1, … , • Therefore, = + = + , are the two top rows of a 3 × 3 rotation matrix Objective Find ∈ ℝ3 and , ∈ ℝ that minimize ( + ) − 2 + ( + ) − =1 =1 Subject to = = 1 = 0 2 Eliminate translation • We can eliminate translation by representing the location of each point relative to the centroids of all points • Assume without loss of generality that the centroid of 1 , … , coincides with the origin ∈ ℝ3 • Translate each image point by setting = − = − ( , ) denotes the centroid of ( , ) Objective (no translation) Find ∈ ℝ3 that minimize − 2 + − =1 =1 Subject to = = 1 = 0 2 Measurement matrix = 11 … 1 11 .. 1 12 . . . 2 12 . . . . . . 2 . . . 1 … 1 … 2× Transformation and shape matrices = 1 … 1 … 1 = 1 1 = 2 2 2 11 … 1 11 … 1 . 12 13 … 3 13 … 3 2 12 2 . . 2×3 3× Objective: matrix notation Find and that minimize − Subject to = = 1 = 0 is 2 × , is 2 × 3, is 3 × = + Noise 11 12 … 1 2 11 12 .. 1 2 11 12 … 1 2 = 11 12 … 1 2 . . . . . . . . . . . . 13 … 3 13 … 3 1 … 1 … 1 1 1 2×3 2× … … + Noise 3× TK-Factorization = + Noise Step 1: find rank 3 approximation to using SVD = Σ where is 2 × 2, = , Σ = diag(1 , 2 , … ), size 2 × , and 1 ≥ 2 ≥ ⋯ ≥ 0 is × , = TK-Factorization = Σ3 where Σ3 = diag(1 , 2 , 3 , 0, 0, … ) Note: this is a relaxation, only noise components outside the 3D space are annihilated Step 2: factorization = Σ3 Ambiguity: = Σ3 = ()(−1 ) for any non-singular, 3 × 3 matrix TK-Factorization Step 3: resolve ambiguity = = 1 = 0 Let = Let = , note that = 2×3 be the corresponding rows in , then 2×3 = Find a 3 × 3 symmetric matrix = = TK-Factorization = = • Equation is linear in • There are 3 equations in 6 unknowns • Find by eigen-decomposition = ∆ so that = ∆ • Solution is obtained up to a rotation ambiguity ()( ) such that = TK-Factorization: Summary 1. Eliminate translation, construct 2. () to get rank 3 and factorize = (3 × 3 ambiguity remains) 3. Resolve ambiguity: estimate by exploiting orthonormality of each rotation, then factorize to obtain Final solution up to rotation and reflection TK-Factorization: pros and cons • Advantages: • Breaks a difficult, non-linear optimization into simple optimization steps • Works well with errors • Disadvantage: • Orthographic projection • Requires complete tracks Factorization with incomplete tracks • Need a way to approximate by a low rank matrix with missing data min ⊙ ( − ) rank =3 a mask, = 1 wherever is known • This problem is NP-hard • Surrogate: minimize the nuclear norm – sum of singular values, 1 + 2 + 3 + ⋯ • Nuclear norm is convex, minimization often achieves low rank • Better iterative procedures exist Perspective multiview stereo • A point = (, , ) is projected to = = • A point rotated by and translated by projects to (2 + ) (1 + ) = = 3 + 3 + denotes the rows of Bundle adjustment • Given points in frames ( , ) find camera matrices and positions that minimize =1 =1 (1 + ) − 3 + 2 (2 + ) + − 3 + • Alternate optimization • Given and , solve for • Given solve for and • Very good initial guess is required 2 Bundler (Photo Tourism) (Snavely et al.) Bundler (Photo Tourism) • Given images, identify feature points, describe them with SIFTs • Match SIFTs, accept each match ↔ whose score is at least twice of any other match ↔ • For every pair of images with sufficiently many matches use RANSAC to recover Essential matrices • Starting with two images and adding one image at a time: use essential matrix to recover depth and apply bundle adjustment Simultaneous solutions • = : Essential matrix between and , × , = 1, … , , available on a subset of image pairs • Objective: recover camera orientation and location relative to a global coordinate system • First step: recover rotations: min − • This can be solved in various ways, for example min − : least squares solution if we ignore the orthonormality constraints for Epipolar relation in global coordinates • The epipolar line relation, = 0 can be written in a global coordinate system as follows × − = 0 × • This generalizes the formula for the essential matrix (plug in = , = ) • Once camera orientations are known we can solve for camera locations (equation is linear and homogeneous in the translation components) • Solution suffers from shrinkage problems Multiview reconsruction