### OpenCV

```Università del Salento
Facoltà di Ingegneria
Image Processing
(Elaborazione delle Immagini)
A.A. 2012/2013
PART II – Two case studies
Dario Cazzato, INO – CNR
[email protected]


This lesson introduce the use of the OpenCV
library on real cases.
Two study cases:
◦ Stereo Correspondence Problem;
◦ Segmentation of Video Sequences.

Why OpenCV?
◦ Free, open source, for real-time application, crossplatform, constantly updated, strong partners and
research.

C/C++:
◦ “cv::Mat vs CvMat”.

Basic components:
◦ Matrixes, vectors, rectangles, sizes, images,
datatype…

Some example.

Men:
◦ Binocular vision.
◦ Average distance between
eyes: 6cm.
◦ An object/point seen with
eyes is viewed as one,
altough in the retina We
have two images.

The combined image is
more than the sum of its
parts. It’s not trivial!

Other configurations:
◦ Animals:
 Predator: binocular sight.
 Prey: lateral eyes (to enlarge the field of sight).
◦ Intersecting line of sight (typical in stereo
vision!).

Let’s see some key concept from a practical
point of view, but all the problem is
absolutely larger!
◦ You will see more detail about epipolar geometry
and matrixes computation at lesson.




The simplest model of the camera.
Only a single ray enters from any particular point,
the pinhole aperture.
This point is projected onto the image plane.
The focal lenght is the distance from the pinhole
aperture to the image plane.


A real point Q is
projected onto the
image plane by the ray
passing through the
center of projection.
This intersection gives
q.
Calibration Matrix (3X4)
From “Learning OpenCV”, G.Bradski,
A.Kaehler, O’Reilly.


Homogeneous
coordinate system.
If you have N
dimension, use N+1
coordinates.
Homography
Perspective Geometry
1 Camera is not enough!
With two (or more) cameras we can compute depth by triangulation,
if we are able to find homologous points in the two images.
Epipolar Geometry

Four steps:
1.
2.
3.
4.
Undistortion;
Rectification;
Disparity Map;
Triangulation.
1.
Undistortion: removal of tangential and
This problem concerns the single camera!
Distortion vector (1X5)
1.
Undistortion: removal of tangential and
2.
Rectify: output row-aligned images (coplanar,
with the same y-coordinate).
With rectified images, we
can search for a point in
one image in the same line
(y-coordinate) of the
second one!

Of course, a stereo calibration is needed
(extrinsic and intrinsic parameters):
◦
◦

Intrinsic: focal lenght, distortion.
Extrinsic: Matrixes R, T that aligns the two cameras
(Essential Matrix E, you will see more at lesson).
We can divide the procedure in :
◦
◦
Stereo Calibration: computation of the geometric
relations between the two cameras in space;
Stereo Rectification: “correction” of individual images
as made with row-aligned image plane and parallel
optical axes.
Look at the example of OpenCV, and the
source stereo_calib.cpp.
Rectification turns the
cameras in standard form!
Example 1
From “Learning OpenCV”, G.Bradski,
A.Kaehler, O’Reilly.
3.
Disparity map: difference in x-coordinates of
the same point viewed in the 2 cameras.
A map is created computing the disparity
for all the points. It’s encoded as a
grayscale image, where farer point are
darker.
4.
Triangulation: difference in x-coordinates of
the same point viewed in the 2 cameras.
Idea: (d:T = f:Z)

How to find homologous points?
◦ Correlation-based - checking if one location in one
image looks/seems like another in another image;
◦ Feature-based - finding features in the image and
seeing if the layout of a subset of features is similar
in the two images.

Occlusions

Photometric transformations

Uniform regions





Noise
Specular surfaces
Perspective views
Repetition
◦ Sometime we just would like to say: “No
correspondent point in the other image for this
point”

…
Local Algorithm
Winner take all strategy

Sum of Absolute Difference:

Sum of Squared Differences:





Zero-mean Normalized Cross Correlation:
Not just the window, but a fast normalization;
ZNCC has range [-1,1];
We compute the ZNCC for each pixel center;
We take the max value.

What we can do to enhounce the model?
◦ Ratio between first and second max;
 Idea behind: if the maximum and a local maximum
have similar value, the probability of error increase,
and we could reject these values (putting a treshold).
 Idea behind: a flat area means repetitive texture. Just
discard maximum in flat peaks.
◦ Multiple windows;
◦ Kernel shape based on segmentation.

Computation time increases!

Two enhouncements:
1. Check the epipolar line ± size:


We can deal with noise in the epipolar geometry;
For a fast computation, keep size small! (1,2,3).
2. Inverse function:


We take the maximum, and we make ZNCC again
starting from the right image;
If the new winner isn’t the starting point (or is more
than a treshold far, an error occurred, so discard the
point).
Demo 1



One of the Video Sequences Segmentation
algorithms.
Good with fixed camera and static
background.
High level goal:
◦ People detection.
◦ Surveillance:
 Reactive;
 Proactive.


BS: subtract the current frame from the
background model.
Two phases:
◦ Background training;
◦ Foreground detection.

Improvements to the base version.




A codebook is built for every pixel;
A codebook is composed by codewords, boxes;
that grow to cover the common values seen over
the time;
Samples of each pixel are clustered in set of
codewords;
Incoming pixel:
◦ It has a brightness in the brightness range AND Color
Distortion less than a treshold = BACKGROUND;
◦ Othervise FOREGROUND.



MNRL (Maximum Negative Run Lenght): let us
to make the background learning with objects
movement.
It refines codebook separating codebooks
that can have foreground from the real
background.
MNRL = 50%.
The foreground is simply detected computing
the distance of the sample from the nearest
cluster mean.

Left object in the scene:

Holes problem:

Layering Modeling/Detection - 3 classes of
codebook and 3 parameters that let to switch
in the categories:
◦ Permanent;
◦ Non-permanent;
◦ Training.

◦ Retraining is not the solution!!
◦ Global status updating at each frame;
◦ Periodical cleanining of the old codebook.

Median filter:

Median filter:

Opening and Closing:
Morphological Operators

Opening and Closing:

Opening:

Closing:

Median Filter:



Why object detection?
Not all the white pixels are of real interest
(noise, holes not yet updated…).
Object detection and labeling algorithm
required.


A: when an external contour point is encountered for the first
time, a complete trace of the contour is made. This procedure
stops when A is found again. All that points will have the
same label A;
B: when A' is encountered (it is an external contour point
already labeled), a scan of the entire line is made, marking
with the same label all the points encountered;


C: when an internal contour point B is encountered for the
first time, it takes the same label. Then a trace of this contour
is made, giving again the same label to all the met point;
D: when an already labeled point is found, like B', a scan of
the entire line is made, marking the detected point with the
same label.

We slide all the blobs continuing to process
only blobs with an area and ratio included in
a range:
◦ [min Area, max Area], in pixel;
◦ [min Ratio, max Ratio].

Decision of range:
◦ Average height (from 1,60m to 2m);
◦ Average width of the box (from 10 cm to 60 cm);
◦ Distance from the camera (from 1m to 5m);

A first necessary loss of genarality.
Demo 2

Technical report (Camera Calibration):
◦ A Flexible New Technique for Camera Calibration,
Zhengyou Zhang, 1998

Papers (Stereo Vision):
◦ Chia-Hung Chen, Han-Pang Huang, and Sheng-Yen
Lo, Stereo-Based 3D Localization for Grasping
KnownObjects with a Robotic Arm System
Department of Mechanical Engineering National
Taiwan University 10647, Taipei, Taiwan

Papers (Codebook):
◦ Kim, Chalidabhongse, Harwood, Davis, Real-Time
foreground-background segmentation using Codebook
model, Computer Vision Lab, Department of Computer
Science, University of Maryland, College Park, MD 20742,
USA, Faculty of Information Technology, King Mongkut’s
Institute of Technology, Ladkrabang, Bangkok 10520,
Thailand, 2005.
◦ P. Fihl, R. Corlin, S. Park, T.B. Moeslund, M.M Trivedi,
Tracking of Individuals in Very Long Video Sequence,
Laboratory of Computer Vision and Media Technology,
Aalborg University, Denmark, Computer Vision and
Robotics Research Laboratory The University of
California, San Diego, USA, 2006

Stereo Images and disparities (ground truth):

Camera Calibration and 3D Reconstruction with
OpenCV:
◦ http://vision.middlebury.edu/stereo
◦ http://docs.opencv.org/modules/calib3d/doc/camera_c
alibration_and_3d_reconstruction.html

Motion Analysis with OpenCV:

Books:

Me:
◦ http://docs.opencv.org/modules/video/doc/motion_ana
lysis_and_object_tracking.html (MOG)
◦ Codebook: “Learning OpenCV” (O’Reilly), Chapter 9.
◦ Stereo Vision: “Learning OpenCV” (O’Reilly), Chapter 12.
◦ [email protected]
```