ECE 661: Computer Vision

Other Course Material (from handouts and lecture)

Camera Calibration

Zhang

Pollefeys

Camera Matrix

$P= \begin{bmatrix} \vec{p_1} & \vec{p_2} & \vec{p_3} & \vec{p_4} \end{bmatrix}$ where each of $\vec{p_1}$ through $\vec{p_3}$ is the image of the point at infinity along the $\vec{x}$ through $\vec{z}$ direction.

$\vec{x}=K R \begin{bmatrix} I \Big| -\vec{\tilde{C}} \end{bmatrix} \vec{X}$

$\vec{x}=M \begin{bmatrix} I \big| M^{-1} \vec{p_4} \end{bmatrix} \vec{X}$

General Projective Camera

Affine Camera

The last row of $$ P $$ is $$ (0, 0, 0, 1) $$

Orthographic Camera

Always positioned at infinity $\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Stratified reconstruction hierarchy

See section 2.4: "A hierarchy of transformations" and section 3.4: "The hierarchy of transformations"

projectivity
- Straight lines go to straight lines
- planar projectivity has 8 DOF
- 3-space projectivity has 15 DOF (every element of the $4 \times 4$ matrix except for the scale factor)
affinity
- parallel lines go to parallel lines
- lines at infinity stay at infinity (Vegas?)
- planar affinity has 6 DOF (4 for arbitrary upper-left matrix, 2 for translation)
- 3-space affinity has 12 DOF (9 for arbitrary upper-left matrix, 3 for translation)
similarity (metric reconstruction, aka equi-form)
- preserves right angles
- planar similarity has 4 DOF (an additional DOF over isometry to account for istotropic scaling)
- 3-space similarity has 7 DOF (an additional DOF over isometry to account for istotropic scaling)
isometry (euclidean reconstruction)
- preserves euclidean distance
- planar isometry has 3 DOF (1 for rotation and 2 for translation)
- 3-space isometry has 6 DOF (6 for rotation and 3 for translation)

Conics

Dual conic

$C^* = C^{-1}$

Polar lines

A point $$ x $$ and a conic $$ C $$ define a line $\textbf{l} = C x$ .

Epipolar geometry

$$ F $$ is rank 2 with 7 DOF
$ x^T F x' = 0 $
- "Most fundamental relationship"
$$ l' = F x $$
$$ l = F^T x' $$
$$ F e = 0 $$
$$ e' F = 0 $$

RANSAC

What is the expression for the probability of getting at least one trial with no outliers given $$ N $$ trials?

Let $\epsilon$ be the probability that a data element is an outlier
Let $\omega = 1 - \epsilon$ be the probability that a data element is an inlier
Let $ s $ be the minium number of datum needed for constructing an estimate
- Probability that all selected elements are inliers: $\omega^s$
- Probability that at least one element is an outlier: $1 - \omega^s$
Probability all $$ N $$ trials suffer from corrupted estimates: $(1 - \omega^s)^N$
Probability that at least one trial has no outliers: $ \Phi = 1 - (1 - \omega^s)^N = 1 - \left (1 - (1 - \epsilon)^s \right )^N $
- $\epsilon$ depends on the data (usually chosen empirically), $$ s $$ depends on what entity is being estimated
- We try to choose $$ N $$ such that $\Phi \ge 0.99$

Direct Linear Transform (DLT )

Levenberg Marquardt

Binary Images

Thresholding (Discriminant Analysis) - Otsu's algorithm

within-class variance: $\sigma_W^2 = \omega_0 \sigma_0^2 + \omega_1 \sigma_1^2$
between-class variance: $\sigma_B^2 = \omega_0 (\mu_0 - \mu_T)^2 + \omega_1 (\mu_1 - \mu_T)^2 = \omega_0 \omega_1 (\mu_1 - \mu_0)^2$
total variance: $\sigma_T^2=-\sum_{i=1}^L (i-\mu_T)^2 p_i$

We wish to maximize the ratio of $\sigma_B^2$ to $\sigma_W^2$ .
A fast implementation uses aggregate results from the previous bin to calculate the next bin.

Corner detection

Why use corners as features to track across image sequences?
- aperture problem

Geometric interpretation of eigenvectors of $$ C $$ :

eigenvectors encode edge directions
eigenvalues encode edge strength
$\lambda_1 = \lambda_2 = 0$ : uniform gray value, and $$ C $$ is a null matrix
$\lambda_1 > 0$ and $\lambda_2 = 0$ : $$ C $$ is rank-deficient, we have an edge
$\lambda_1 \ge \lambda_2 > 0$ , we have a corner.

Edge Finding

Roberts Operator

$\begin{bmatrix} 0 & 1 \\ -1 & 0 \\ \end{bmatrix}$

$\begin{bmatrix} 1 & 0 \\ 0 & -1 \\ \end{bmatrix}$

Canny Edge Detector

Optimality criterion

Good detection
- minimize false positives (noise)
- minimize false negatives (don't miss real edges)
Good localization
- Detected edges should be as close as possible to true edges
Single response constraint
- return only one point for each true edge point (use hysteresis)

Localization-detection tradeoff

Increasing filter size improves detection at the expense of localization

Chain codes / Crack codes

2 bits (3 bits) encode direction for 4-connectedness (8-connectedness)
Crack codes fall on the "cracks" between pixels; pixels are not interpreted as part of the boundary

Hough Transform

Graph Cuts

Recall, "Rayleigh quotient": $\frac{x^T \textbf{A} x}{x^T x}$

$\frac{y^T (\textbf{D} - \textbf{W}) y}{y^T \textbf{D} y}$ The second-smallest eigenvector is used because $$ y_i $$ does not necessarily take on two discrete values.

Machine Learning/Class Discrimination

Entropy

$H(x)=-\sum_{i=1}^n p_i \log_2(p_i)$

Conditional Entropy

$$ H(Y|X) = H(X,Y) - H(X) $$

Fall 2006 Midterm

1

Given the identity $l \cdot (l\times l') = 0$ and the fact that $x = l \times l'$ , we know $$ l^T x = 0 $$ .
Therefore $$ x $$ is on $$ l $$ .

Similarly, $l'\cdot(l' \times l) = l'^T x = 0$ . Thus $$ x $$ is also on $$ l' $$ .

Since $$ x $$ lies on both lines it must be the point of intersection.

2

Given the two identities $x(x \times x') = 0$ and $x'(x' \times x) = 0$ and the point $l = x \times x' = x' \times x$ ,

$$ lx = lx' = 0 $$

Thus l passes through both x and x' and is therefore the line joining the two points. Given $$ x' = Hx, $$

3

Start with $l' = H^{-T}l:$
$l'^T = l^T H^{-1}$ take the transpose of both sides
$l'^Tx' = l^TH^{-1}x'$ post multiply both sides by x'
$l'^Tx' = l^TH^{-1}Hx$ convert x'
$$ l'^Tx' = l^Tx = 0 $$
$\therefore$ true

Note that the previous exam asks you to prove a false statement.

Given $x = Hx' \implies x' = H^{-1}x$

$l' = H^{-T}l:$
$l'^T = l^T H^{-1}$
$l'^Tx' = l^TH^{-1}x'$
$l'^Tx' = l^TH^{-1}H^{-1}x$
$$ l'^Tx' = 0 $$
$l^TH^{-1}H^{-1}x \neq 0$
$\therefore l'^Tx' \neq l^TH^{-1}H^{-1}x$
$\therefore false$

6

Part (a)

$L^* = PQ^\textrm{T} - QP^\textrm{T}$

$\pi = L^*X$

$\pi^\textrm{T} = (L^*X)^\textrm{T} = X^\textrm{T}L^{*T}$

$\begin{align} \pi^\textrm{T} X & = X^\textrm{T}L^{*\textrm{T}}X \\ & = -X^\textrm{T}L^* X \\ & = -X^\textrm{T}(PQ^\textrm{T} - QP^\textrm{T})X \\ & = -X^\textrm{T}PQ^\textrm{T}X + X^\textrm{T}QP^\textrm{T}X \end{align}$

$$ X $$ lies on $\pi$ therefore $\pi^\textrm{T}X = 0$ and $X^\textrm{T}\pi = 0$

We can assume $P = \pi$ without loss of generality.

Therefore $\pi^\textrm{T}X = 0 = -0Q^\textrm{T}X + X^\textrm{T}Q0 = 0$

Therefore $\pi = L^*X$

Part (b)

$L = AB^\textrm{T} - BA^\textrm{T}$

$X = L\pi$

$X^\textrm{T} = (L\pi)^\textrm{T} = \pi^\textrm{T}L^\textrm{T}$

$\begin{align} X^\textrm{T}\pi & = \pi^\textrm{T}L^\textrm{T}\pi \\ & = -\pi^\textrm{T}L\pi \\ & = -\pi^\textrm{T}(AB^\textrm{T} - BA^\textrm{T})\pi \\ & = -\pi^\textrm{T}AB^\textrm{T}\pi + \pi^\textrm{T}BA^\textrm{T}\pi \end{align}$

$$ A $$ and $$ X $$ lie on $\pi$ therefore $X^\textrm{T}\pi = 0$ and $\pi^\textrm{T}A = 0$ and $A^\textrm{T}\pi = 0$

Therefore $X^\textrm{T}\pi = 0 = -0B^\textrm{T}\pi + \pi^\textrm{T}B0 = 0$

Therefore $X = L\pi$

8

If the brightness values in the x and y directions are thought of as random variables then C is a scaled version of their covariance matrix.

The eigenvectors of a covariance matrix form an the orthogonal basis which yeilds the highest entropy along the axes.

C is a scaled version of the covariance matrix of the brightnesses in the x and y directions .

9

$P= \begin{bmatrix} \vec{p^1}^T \\ \vec{p^2}^T \\ \vec{p^3}^T \\ \end{bmatrix}$

Each row $$ p^i $$ represents a plane.

$\vec{p^i}^T \textbf{X} = 0$ means that point $\textbf{X}$ lies on plane $$ p^i $$ . Thus $\vec{p^3}^T \textbf{X} = 0$ means that $\textbf{X}$ lies on the principal plane, and lying on the planes $$ p^1 $$ or $$ p^2 $$ mean that the projected point $$ x $$ will lie on the $\hat y$ or $\hat x$ image axis, respectively.

10

A world point lying on the principal axis will project to an an image coordinate (0,0). Does this help? Consult p. 158-159

Fall 2006 Final

2

If $\textbf{X}$ is on $\pi$ , then $\textbf{X}^T \pi = 0$ .
If $$ x $$ is on $\textbf{l}$ , then $x^T \textbf{l} = 0$ .

$\begin{align} \textbf{X}^T \pi & = \textbf{X}^T \left ( \textbf{P}^T \textbf{l} \right ) \\ & = \left ( \textbf{P} \textbf{X} \right )^T \textbf{l} \\ & = x^T \textbf{l} \\ \end{align}$

3

Part (a)

Given camera matrix $\vec{x}=K R \begin{bmatrix} I \big| {-\vec{\tilde{C}}} \end{bmatrix} \vec{X}$ , a world point $\textbf{X}_\infty = \begin{bmatrix} \textbf{d}^T & 0 \end{bmatrix}^T$ , maps as...

$\begin{align} x & = \textbf{P} \textbf{X}_\infty \\ & = K R \begin{bmatrix} I \big| {-\vec{\tilde{C}}} \end{bmatrix} \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ 0 \end{bmatrix} \\ & = K R \textbf{d} \\ & = \textbf{H} \textbf{d} \end{align}$

Part (b)

Conics transform as $C' = \textbf{H}^{-T} C \textbf{H}^{-1}$ . Therefore, the IAC

$\begin{align} \omega &= \textbf{H}^{-T} \Omega_\infty \textbf{H}^{-1} \\ &= \textbf{H}^{-T} \textbf{I} \textbf{H}^{-1} \\ &= \textbf{H}^{-T} \textbf{H}^{-1} \\ &= \left ( \textbf{K R} \right ) ^{-T} \left ( \textbf{K R} \right )^{-1} \\ &= \left ( \textbf{R}^T \textbf{K}^T \right )^{-1} \left ( \textbf{K R} \right )^{-1} \\ &= \textbf{K}^{-T} \textbf{R}^{-T} \textbf{R}^{-1} \textbf{K}^{-1} \\ &= \textbf{K}^{-T} \textbf{R} \textbf{R}^{-1} \textbf{K}^{-1} \\ &= \textbf{K}^{-T} \textbf{K}^{-1}\\ &= \left ( \textbf{K} \textbf{K}^{T} \right )^{-1} \end{align}$ . I grok this.

4

This is the theoretical justification of Zhang's method for camera calibration.

Part (a)

Assume we have a homography $\textbf{H}$ that maps points $x_\pi$ on a probe plane $\pi$ to points $$ x $$ on the image.

The circular points $I, J = \begin{bmatrix} 1 \\ \pm j \\ 0 \end{bmatrix}$ lie on both our probe plane $\pi$ and on the absolute conic $\Omega_\infty$ . Lying on $\Omega_\infty$ of course means they are also projected onto the image of the absolute conic (IAC) $\omega$ , thus $x_1^T \omega x_1= 0$ and $x_2^T \omega x_2= 0$ . The circular points project as

$\begin{align} x_1 & = \textbf{H} I = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \begin{bmatrix} 1 \\ j \\ 0 \end{bmatrix} = h_1 + j h_2 \\ x_2 & = \textbf{H} J = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \begin{bmatrix} 1 \\ -j \\ 0 \end{bmatrix} = h_1 - j h_2 \end{align}$ .

We can actually ignore $$ x_2 $$ while substituting our new expression for $$ x_1 $$ as follows:

$\begin{align} x_1^T \omega x_1 &= \left ( h_1 + j h_2 \right )^T \omega \left ( h_1 + j h_2 \right ) \\ &= \left ( h_1^T + j h_2^T \right ) \omega \left ( h_1 + j h_2 \right ) \\ &= h_1^T \omega h_1 + j \left ( h_2^T \omega h_2 \right ) \\ &= 0 \end{align}$

which, when separating real and imaginary parts give us

$\begin{align} h_1^T \omega h_1 &= 0 \\ h_2^T \omega h_2 &= 0 \end{align}$

Since conics are symmetric matrices, $\omega = \omega^T$ and...

Part (b)

8

We prove the Projective Reconstruction Theorem.

Say that the correspondence $x \leftrightarrow x'$ derives from the world point $\textbf{X}$ under the camera matrices $\left ( \textbf{P}, \textbf{P}' \right )$ as

$\begin{align} x & = \textbf{P} \textbf{X} \\ x' & = \textbf{P}' \textbf{X} \end{align}$ .

Say we transform space by a general homography matrix $\textbf{H}_{4 \times 4}$ such that $\textbf{X}_0 = \textbf{H} \textbf{X}$ .

The cameras then transform as

$\begin{align} \textbf{P}_0 & = \textbf{P} \textbf{H}^{-1} \\ \textbf{P}_0' & = \textbf{P}' \textbf{H}^{-1} \end{align}$ .

$\textbf{P}_0 \textbf{X}_0 = \textbf{P} \textbf{H}^{-1} \textbf{H} \textbf{X} = \textbf{P} \textbf{X} = x$ and likewise with $\textbf{P}_0'$ still get us the same image points.

ECE661Fall2008Kak - Rhea

Contents

ECE 661: Computer Vision

Other Course Material (from handouts and lecture)

Camera Calibration

Zhang

Pollefeys

Camera Matrix

General Projective Camera

Affine Camera

Orthographic Camera

Stratified reconstruction hierarchy

Conics

Dual conic

Polar lines

Epipolar geometry

RANSAC

Direct Linear Transform (DLT )

Levenberg Marquardt

Binary Images

Thresholding (Discriminant Analysis) - Otsu's algorithm

Corner detection

Edge Finding

Roberts Operator

Canny Edge Detector

Chain codes / Crack codes

Hough Transform

Graph Cuts

Machine Learning/Class Discrimination

Entropy

Conditional Entropy

Fall 2006 Midterm

1

2

3

6

Part (a)

Part (b)

8

9

10

Fall 2006 Final

2

3

Part (a)

Part (b)

4

Part (a)

Part (b)

8

Alumni Liaison