### Flow

```Algorithms & Applications in
Computer Vision
Lihi Zelnik-Manor
Lecture 9: Optical Flow
Last Time: Alignment
•
•
•
•
•
•
Homographies
Rotational Panoramas
RANSAC
Global alignment
Warping
Blending
Today: Motion and Flow
• Motion estimation
• Patch-based motion
(optic flow)
• Regularization and line
processes
• Parametric (global)
motion
• Layered motion models
Optical Flow
Where did each pixel in image 1 go to in image 2
Optical Flow
Pierre Kornprobst's Demo
Introduction
• Given a video sequence with
camera/objects moving we can better
understand the scene if we find the
motions of the camera/objects.
Scene Interpretation
•
•
•
•
•
How is the camera moving?
How many moving objects are there?
Which directions are they moving in?
How fast are they moving?
Can we recognize their type of motion (e.g.
walking, running, etc.)?
Applications
• Recover camera ego-motion.
Result by MobilEye (www.mobileye.com)
Applications
• Motion segmentation
Result by: L.Zelnik-Manor, M.Machline, M.Irani
“Multi-body Segmentation: Revisiting Motion Consistency” To appear, IJCV
Applications
• Structure from Motion
Input
Reconstructed shape
Result by: L. Zhang, B. Curless, A. Hertzmann, S.M. Seitz
“Shape and motion under varying illumination: Unifying structure from motion, photometric
stereo, and multi-view stereo” ICCV’03
Examples of Motion fields
Forward
motion
Rotation
Horizontal
translation
Closer
objects
appear to
move faster!!
Motion Field & Optical Flow Field
• Motion Field = Real world 3D motion
• Optical Flow Field = Projection of the
motion field onto the 2d image
CCD
2D optical
flow vector

u  u , v 
3D motion vector
When does it break?
The screen is
stationary yet
displays motion
Homogeneous
objects
generate zero
optical flow.
Fixed sphere.
Changing light
source.
Non-rigid
texture motion
The Optical Flow Field
Still, in many cases it does work….
• Goal:

Find for each pixel a velocity vector u  u , v 
which says:
– How quickly is the pixel moving across the image
– In which direction it is moving
How do we actually do that?
Estimating Optical Flow
• Assume the image intensity I is constant
Time = t
Time = t+dt
 x, y 
I  x, y, t  
 x  dx, y  dy 
I  x  dx , y  dy , t  dt 
Brightness Constancy Equation
I  x , y , t   I  x  dx , y  dy , t  dt 
First order Taylor Expansion
 I x, y, t  
I
x
dx 
I
y
dy 
I
t
dt
Simplify notations:
I x dx  I y dy  I t dt  0 flow are
orthogonal
Divide by dt and denote:
dx
dy
u 
v
dt
dt
I xu  I y v   I t
Problem I:
One equation,
two unknowns
Problem II:
“The Aperture Problem”
• For points on a line of fixed intensity we can only
recover the normal flow
Time t
Time t+dt
?
Where did the blue point move to?
Use Local Information
Sometimes enlarging the aperture can help
Local smoothness
I xu  I y v   It
I x
u 
I y     I t
v 
Assume constant (u,v) in small neighborhood
 I x1 I y 1 
 I t1 
u 
I



x2 I y 2     It 2

 v 





  

Au  b
Goal:
Minimize

Au  b
2

Au  b
Method: Least-Squares

T
A Au A b
T
2x2 2x1


T
u  A A
2x1

1
T
A b


T
u  A A

1
  I x2
T
A A
  I x I y
T
A b
 IxI y 

2
Iy


We want this matrix to be invertible.
i.e., no zero eigenvalues
• Edge  A T A becomes singular
 I y , I x 
I x , I y 
  I x2

 I x I y
 I x I y   I y 
I
2
y

 Ix
0 
  0 
  
 I y 
 I  is eigenvecto r with eigenvalue
 x 
0
T
• Homogeneous  A A  0  0 eigenvalues
I x , I y   0
• Textured regions  two high eigenvalues
I x , I y   0
• Edge  A T A becomes singular
• Homogeneous regions  low gradients
A A0
T
• High texture 
Iterative Refinement
• Estimate velocity at each pixel using one
iteration of Lucas and Kanade estimation


T
u  A A

1
T
A b
• Warp one image toward the other using the
estimated flow field
(easier said than done)
• Refine estimate by repeating the process
Optical Flow: Iterative Estimation
estimate
update
Initial guess: u  0
0
Estimate: u  u  uˆ
uˆ
1
x0
x
0
Optical Flow: Iterative Estimation
f 1 ( x  u1 )
estimate
update
Initial guess: u 1
Estimate: u  u  uˆ
2
1
uˆ
x0
x
Optical Flow: Iterative Estimation
f1 ( x  u 2 )
estimate
update
Initial guess: u 2
Estimate: u 3  u 2  uˆ
uˆ
x0
x
Optical Flow: Iterative Estimation
f1 ( x  u 3 )  f 2 ( x )
x0
x
Optical Flow: Iterative Estimation
• Some Implementation Issues:
– Warping is not easy (ensure that errors in warping
are smaller than the estimate refinement)
– Warp one image, take derivatives of the other so
you don’t need to re-compute the gradient after
each iteration.
– Often useful to low-pass filter the images before
motion estimation (for better derivative estimation,
and linear approximations to image intensity)
Other break-downs
• Brightness constancy is not satisfied
Correlation based methods
• A point does not move like its neighbors
– what is the ideal window size?
Regularization based methods
• The motion is not small (Taylor expansion doesn’t hold)
• Aliasing
Use multi-scale estimation
Optical Flow: Aliasing
Temporal aliasing causes ambiguities in optical flow because
images can have many pixels with the same intensity.
I.e., how do we know which ‘correspondence’ is correct?
actual shift
estimated shift
nearest match is correct
(no aliasing)
nearest match is incorrect
(aliasing)
To overcome aliasing: coarse-to-fine estimation.
Multi-Scale Flow Estimation
u=1.25 pixels
u=2.5 pixels
u=5 pixels
image Itt-1
Gaussian pyramid of image It
u=10 pixels
image I t+1
Gaussian pyramid of image It+1
Multi-Scale Flow Estimation
warp & upsample
.
.
.
image Itt-1
Gaussian pyramid of image It
image I t+1
Gaussian pyramid of image It+1
Other break-downs
• Brightness constancy is not satisfied
Correlation based methods
• A point does not move like its neighbors
– what is the ideal window size?
Regularization based methods
• The motion is not small (Taylor expansion doesn’t hold)
• Aliasing
Use multi-scale estimation
Spatial Coherence
Assumption
* Neighboring points in the scene typically belong to the same
surface and hence typically have similar motions.
* Since they also project to nearby points in the image, we expect
spatial coherence in image flow.
Black
Formalize this Idea
Noisy 1D signal:
u
x
Noisy measurements p(x)
Black
Regularization
Find the “best fitting” smoothed function v(x)
u
q(x)
x
Noisy measurements p(x)
Black
Membrane model
Find the “best fitting” smoothed function v(x)
u
q(x)
x
Black
Membrane model
Find the “best fitting” smoothed function q(x)
u
q(x)
x
Black
Membrane model
Find the “best fitting” smoothed function q(x)
u
q(x)
Black
Regularization
u
q(x)
x
N
E (q ) 
Spatial smoothness
assumption
Faithful to the data
Minimize:
 (q ( x) 
x 1
N 1
p ( x ))    ( q ( x  1)  q ( x ))
2
2
x 1
Black
Bayesian Interpretation
N 1
N
E (v ) 

( v ( x )  u ( x ))    ( v ( x  1)  v ( x ))
2
x 1
2
x 1
p ( v | u )  p (u | v ) p ( v )
u ( x)  v( x)  
 ~ N ( 0 , 1)
v ( x )  v ( x  1)   2  2 ~ N ( 0 ,  2 )
N
p (u | v ) 

x 1
N 1
p (v ) 

x 1
1
2  1
exp( 
1
2  2
1
2
( u ( x )  v ( x )) /  1 )
exp( 
2
1
2
2
( v x ( x )) /  2 )
2
2

Black
Discontinuities
u
q(x)
x
What is happening here?
What can we do?
Black
Robust Estimation
Noise distributions are often non-Gaussian, having much heavier tails. Noise
samples from the tails are called outliers.
• Sources of outliers (multiple motions):
– specularities / highlights
– jpeg artifacts / interlacing / motion blur
– multiple motions (occlusion boundaries, transparency)
u2
velocity space
+
+
u1
Black
Occlusion
occlusion
disocclusion
shear
Multiple motions within a finite region.
Black
Coherent Motion
Possibly Gaussian.
Black
Multiple Motions
Definitely not Gaussian.
Black
Weak membrane model
u
v(x)
x
N
E (q, l ) 

x 1
N 1
2
2
( q ( x )  p ( x ))     l ( x )( q ( x  1)  q ( x ))   (1  l ( x )) 
x 1
l ( x )  { 0 ,1}
Black
Analog line process
Penalty function
N
E (q, l ) 

x 1
N 1
2
2
( q ( x )  p ( x ))     l ( x )( q ( x  1)  q ( x ))   ( l ( x )) 
x 1
0  l( x)  1
Black
Analog line process
Infimum defines a robust error function.
Minima are the same:
N
E (q, l ) 

x 1
N 1
2
2
( q ( x )  p ( x ))     l ( x )( q ( x  1)  q ( x ))   ( l ( x )) 
x 1
N
E (v ) 

x 1
N 1
( q ( x )  p ( x ))     ( q ( x  1)  q ( x ),  2 )
2
x 1
Black
Least Squares Estimation
Standard Least Squares Estimation allows too much
influence for outlying points
E (m)    ( xi )
i
 ( xi )  ( xi  m) 2

Influence  ( x) 
 ( xi  m)
x
Robust Regularization
u
v(x)
x
Treat large spatial derivatives as outliers.
Minimize:
N
E (v ) 
  (q ( x) 
x 1
N 1
p ( x ),  1 )     ( q ( x  1)  q ( x ),  2 )
x 1
Black
Robust Estimation
Problem: Least-squares estimators penalize deviations between
data & model with quadratic error fn (extremely sensitive to outliers)
error penalty function
influence function
Redescending error functions (e.g., Geman-McClure) help to reduce
the influence of outlying measurements.
error penalty function
influence function
Black
Regularization
Horn and Schunk (1981)
Smoothness error:
E s  

2
ux

2
uy
 
2
vx

2
vy
dx dy
D
Error in brightness
constancy equation
E c    I x u  I y v  I t  dx dy
Minimize:
2
D
Ec  Es
Solve by calculus of variations
Robust Estimation
Black & Anandan (1993)
Regularization can over-smooth across edges
Use “smarter” regularization
Robust Estimation
Black & Anandan (1993)
Regularization can over-smooth across edges
Use “smarter” regularization
Minimize:
  1 I x u  I y v  I t     2 u x , u y  2 v x , v y dx dy
D
Brightness constancy
Smoothness
Optimization
• Coarse-to-fine (pyramid)
• Deterministic annealing
Black
Example
Multi-resolution registration
Optical Flow Results
Optical Flow Results
Parametric motion estimation
Global (parametric) motion models
•
•
•
•
2D Models:
Affine
Planar projective transform (Homography)
•
•
•
•
3D Models:
Instantaneous camera motion models
Homography+epipole
Plane+Parallax
Szeliski
Motion models
Translation
Affine
Perspective
3D rotation
2 unknowns
6 unknowns
8 unknowns
3 unknowns
Szeliski
Example: Affine Motion
For panning camera or planar surfaces:
u  p1  p 2 x  p 3 y
v  p 4  p5 x  p6 y
I x ( p1  p 2 x  p 3 y )  I y ( p 4  p 5 x  p 6 y )   I t
I x
Ixx
Ix y
Iy
Iyx

I y y  p   It
Only 6 parameters to solve for Better results
Least Square Minimization (over all pixels):
Err ( a ) 
 I x
Ixx Ix y I y


I y x I y y p  I t
2
Last lecture: Alignment / motion warping
• “Alignment”: Assuming we know the correspondences,
how do we get the transformation?
( xi , y i )
e.g., affine model in abs. coords…
 x i   m 1
   
 yi  m3
m 2   x i   t1 
    
m 4   y i  t 2 
( x i , y i )
• Expressed in terms of absolute
coordinates of corresponding
points…
• Generally presumed features
separately detected in each frame
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
( xi , y i )
( x i , y i )
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
( xi , y i )
( x i , y i )
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
( xi , y i )
( x i , y i )
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
( xi , y i )
(ui,vi)
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
(ui,vi)
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Today: “flow”, “parametric motion”
• Two views presumed in temporal sequence…track
(ui,vi)
Previous Alignment model:
 x i   m 1
   
 yi  m3
m 2   x i   t1 
    
m 4   y i  t 2 
Now, Displacement model:
u i   a 2
  
 vi   a5
u ( x , y )  a1  a 2 x  a 3 y
v( x, y )  a 4  a 5 x  a 6 y
a 3   x i   a1 
    
a6   yi  a4 
• Sparse or dense in first frame
• Search in second frame
• Motion models expressed in
terms of position change
Other 2D Motion Models
to planar motion
u  q 1  q 2 x  q 3 y  q 7 x  q 8 xy
2
v  q 4  q 5 x  q 6 y  q 7 xy  q 8 y
x' 
Projective – exact planar motion
y'
2
h1  h 2 x  h 3 y
h 7  h8 x  h 9 y
h 4  h5 x  h6 y
h 7  h8 x  h 9 y
and
u  x ' x ,
v  y ' y
Szeliski
3D Motion Models
Instantaneous camera motion:
Global parameters:
 X ,  Y ,  Z , T X , TY , T Z
Local Parameter:
Z ( x, y )
u   xy  X  (1  x )  Y  y  Z  (T X  T Z x ) Z
2
v   (1  y )  X  xy  Y  x  Z  (TY  T Z x ) Z
2
x'
Homography+Epipole
Global parameters:
Local Parameter:
h 7 x  h8 y  h 9   t 3
h 4 x  h 5 y  h 6   t1
h1 ,  , h9 , t1 , t 2 , t 3
y'
 ( x, y )
and :
Residual Planar Parallax Motion
Global parameters:
Local Parameter:
h1 x  h 2 y  h 3   t1
h 7 x  h8 y  h 9   t 3
u  x ' x ,
u  x x
w
t1 , t 2 , t 3
 ( x, y )
v y x
w

1   t3

1   t3
v  y ' y
( t 3 x  t1 )
(t 3 y  t 2 )
Segmentation of Affine Motion
=
Input
+
Segmentation result
Result by: L.Zelnik-Manor, M.Irani
“Multi-frame estimation of planar motion”, PAMI 2000
Panoramas
Input
Motion estimation by Andrew Zisserman’s group
Stabilization
Result by: L.Zelnik-Manor, M.Irani
“Multi-frame estimation of planar motion”, PAMI 2000
• Consider image I translated by u0 , v0
I 0 ( x, y )  I ( x, y )
I1 ( x  u0 , y  v0 )  I ( x, y )  1 ( x, y )
E (u , v)   ( I ( x, y )  I1 ( x  u , y  v)) 2
x, y
  ( I ( x, y )  I ( x  u0  u , y  v0  v)  1 ( x, y )) 2
x, y
• The discrete search method simply searches for the best
estimate.
• The gradient method linearizes the intensity function and
solves for the estimate
Szeliski
Correlation and SSD
• For larger displacements, do template matching
– Define a small area around a pixel as the template
– Match the template against each pixel within a search
area in next image.
– Use a match measure such as correlation, normalized
correlation, or sum-of-squares difference
– Choose the maximum (or minimum) as the match
Szeliski
Shi-Tomasi feature tracker
1. Find good features (min eigenvalue of 22
Hessian)
2. Use Lucas-Kanade to track with pure
translation
3. Use affine registration with first feature
patch
4. Terminate tracks whose dissimilarity gets too
large
5. Start new tracks when needed
Szeliski
Tracking result
How well do these
techniques work?
A Database and Evaluation
Methodology for Optical Flow
Simon Baker, Daniel Scharstein, J.P Lewis, Stefan
Roth, Michael Black, and Richard Szeliski
ICCV 2007
http://vision.middlebury.edu/flow/
Limitations of Yosemite
Yosemite
• Only sequence used for quantitative evaluation
Image 7
•
•
•
•
Image 8
Ground-Truth Flow
Flow Color
Coding
Limitations:
Very simple and synthetic
Small, rigid motion
Minimal motion discontinuities/occlusions
Szeliski
Limitations of Yosemite
Yosemite
• Only sequence used for quantitative evaluation
Image 7
•
•
•
•
•
•
Image 8
Ground-Truth Flow
Flow Color
Coding
Current challenges:
Non-rigid motion
Real sensor noise
Complex natural scenes
Motion discontinuities
Need more challenging and more realistic benchmarks
Szeliski
Realistic synthetic imagery
Grove
Rock
• Randomly generate scenes with “trees” and “rocks”
• Significant occlusions, motion, texture, and blur
• Rendered using Mental Ray and “lens shader” plugin
Motion estimation
89
Szeliski
Modified stereo imagery
Moebius
Venus
• Recrop and resample ground-truth stereo
datasets to have appropriate motion for OF
90
Szeliski
Dense flow with hidden texture
•
•
•
•
Paint scene with textured fluorescent paint
Take 2 images: One in visible light, one in UV light
Move scene in very small steps using robot
Generate ground-truth by tracking the UV images
Visible
UV
Setup
Lights
Image
Cropped
Szeliski
Experimental results
http://vision.middlebury.edu/flow/
Motion estimation
92
Szeliski
Conclusions
• Difficulty: Data substantially more
challenging than Yosemite
• Diversity: Substantial variation in difficulty
across the various datasets
• Motion GT vs Interpolation: Best algorithms
for one are not the best for the other
• Comparison with Stereo: Performance of
existing flow algorithms appears weak
Szeliski
Layered Scene Representations
Motion representations
• How can we describe this scene?
Szeliski
Block-based motion prediction
• Break image up into square blocks
• Estimate translation for each block
• Use this to predict next frame, code difference
(MPEG-2)
Szeliski
Layered motion
• Break image sequence up into “layers”:
•

=
• Describe each layer’s motion
Szeliski
Layered Representation
For scenes with multiple affine motions
Estimate dominant motion parameters
Reject pixels which do not fit
Convergence
Restart on remaining pixels
Layered motion
•
•
•
•
•
•
•
•
can represent occlusions / disocclusions
each layer’s motion can be smooth
video segmentation for semantic processing
Difficulties:
how do we determine the correct number?
how do we assign pixels?
how do we model the motion?
Szeliski
How do we form them?
Szeliski
How do we estimate the layers?
1.
2.
3.
4.
5.
compute coarse-to-fine flow
estimate affine motion in blocks (regression)
cluster with k-means
assign pixels to best fitting affine region
re-estimate affine motions in each region…
Szeliski
Layer synthesis
•
•
•
•
For each layer:
stabilize the sequence with the affine motion
compute median value at each pixel
Determine occlusion relationships
Szeliski
Results
Szeliski
Recent results: SIFT Flow
Recent results: SIFT Flow
Recent GPU Implementation
• http://gpu4vision.icg.tugraz.at/
• Real time flow exploiting robust norm +
regularized mapping
Today: Motion and Flow
•
•
•
•
•
Motion estimation
Patch-based motion (optic flow)
Regularization and line processes
Parametric (global) motion
Layered motion models
Slide Credits
• Trevor Darrell
• Rick Szeliski
• Michael Black
```