Detectors and Descriptors

Applications To Have In Mind

Core computer vision applications.
Image alignment: Determining if different images contain parts of the same photo or object viewed from another angle.

Applications To Have In Mind

Building 3D reconstructions from multiple images.

Applications To Have In Mind

Stitching photos taken at different angles to create a panorama.

One Example: Image Alignment

Aligning two given images.

One Solution

Iterate through possible translations to find the best alignment.
Check alignment with images for all pixels within a search range:
python for y in range(-ySearch, ySearch+1): for x in range(-xSearch, xSearch+1): check_alignment_with_images()

One Motivating Example

Aligning images with 3D rotation and translation.

An Alternate Approach

Find corners and features in the images.
Match these features based on local image data.

What Now?

Given pairs p1, p2 of correspondence, how do I align?
Consider translation-only case.

An Alternate Approach

Solve for transformation T (e.g., such that p1 ≡ T p2) that fits the matches well.
Homogeneous coordinates are used.

An Alternate Approach

Blend the aligned images together.
Key insight: Operate on parts of the image rather than the entire image.

Today: Finding Edges and Corners

Focus on finding edges and corners in images.
Edges often occur at object boundaries or changes in surface properties.

Where do Edges Come From?

Edges can arise from depth/distance discontinuities.

Where do Edges Come From?

Edges can arise from surface normal / orientation discontinuities.

Where do Edges Come From?

Edges can arise from surface color / reflectance properties discontinuities.

Where do Edges Come From?

Edges can arise from illumination discontinuities.

Derivatives

Illustrates derivative filters lx and ly.
l_x filter: [-1, 0, 1] (horizontal derivative)
l_y filter: [-1, 0, 1]^T (vertical derivative)

Derivatives

Derivative: Rate of change of a function f(x) at a point, indicating the direction of increase.
Gradient: Vector of all partial derivatives.

What Should I Know?

Gradients are partial derivatives per dimension.
For an n-dimensional input, the gradient also has n dimensions.
Gradients point in the direction of ascent and indicate the rate of ascent.
If a is a minimum of f(x), then \nabla f(a) = 0.
The reverse is not always true, especially in high-dimensional spaces.

Derivatives

Illustrates derivative filters lx and ly.
l_x filter: [-1, 0, 1] (horizontal derivative)
l_y filter: [-1, 0, 1]^T (vertical derivative)

Why Does This Work?

Definition of a partial derivative:
\frac{\partial f(x, y)}{\partial x} = \lim_{\epsilon \to 0} \frac{f(x + \epsilon, y) - f(x, y)}{\epsilon}
Approximation using a finite difference:
\frac{\partial f(x, y)}{\partial x} \approx f(x + 1, y) - f(x, y)
Alternative approximation:
\frac{\partial f(x, y)}{\partial x} \approx \frac{f(x + 1, y) - f(x - 1, y)}{2}

Other Differentiation Operations

Prewitt and Sobel operators are used for edge detection.
Prewitt filters:
Horizontal: [-1, 0, 1; -1, 0, 1; -1, 0, 1]
Vertical: [[1, 1, 1], [0, 0, 0], [-1, -1, -1]]
Sobel filters:
Horizontal: [-1, 0, 1; -2, 0, 2; -1, 0, 1]
Vertical: [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

Images as Functions or Points

Treating an image as a point in R^{H \times W} or as a function of x, y.
Image gradient:
\nabla I(x, y) = \begin{bmatrix} \frac{\partial I}{\partial x}(x, y) \ \frac{\partial I}{\partial y}(x, y) \end{bmatrix}
\frac{\partial I}{\partial x}(x, y) (often called Ix) represents the horizontal change in intensity at (x, y).

Image Gradient Direction

Examples of gradient directions.
\nabla f = [\frac{\partial f}{\partial x}, 0]
\nabla f = [0, \frac{\partial f}{\partial y}]
\nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}]

Image Gradient

Gradient: direction of maximum change.
Relationship to edge direction.

Image Gradient

Magnitude of the gradient: \sqrt{lx^2 + ly^2}

Image Gradient

Orientation of the gradient: atan2(ly, lx)
Lightness equal to gradient magnitude.

Image Gradient

Orientation of the gradient: atan2(ly, lx)

Image Gradient

Orientation of the gradient: atan2(ly, lx)
Discusses the structure observed in gradient orientations.

Noise

Effect of noise on a row of an image f(x, y).

Noise

Convolution with a derivative filter amplifies noise.
D{i,j} = (I{i,j+1} + \epsilon{i,j+1}) - (I{i,j-1} + \epsilon_{i,j-1})
Where I{i,j} is the true image, and \epsilon{i,j} \sim N(0, \sigma^2).
D{i,j} = (I{i,j+1} - I{i,j-1}) + \epsilon{i,j+1} - \epsilon_{i,j-1}
\epsilon{i,j} - \epsilon{k,l} \sim N(0, 2\sigma^2): Variance doubles.

Noise

Consider a row of f(x, y) (i.e., make y constant).
How can we use the last class to fix this?

Handling Noise

Convolution with a Gaussian kernel followed by differentiation.
\frac{d}{dx}(f * g)

Noisy Input

Noise in 2D, visualized with a zoomed-in meme image.
Ix calculated via [-1, 0, 1] filter.

Noise + Smoothing

Smoothed input image, then applying Ix via [-1, 0, 1].

Let’s Make It One Pass (1D)

Differentiation of a convolution is equivalent to convolving with the derivative of the kernel.
\frac{d}{dx}(f * g) = f * \frac{d}{dx}g

Let's Make It One Pass (2D)

Gaussian Derivative Filter
Which one finds horizontal edges? (The one that is differentiating in the y direction)

Applying the Gaussian Derivative

Applying Gaussian derivative filters with different pixel sizes (1, 3, 7 pixels).
Removes noise but blurs edges.

Compared with the Past

Comparing Gaussian derivative filter with Sobel filter.
Gaussian Derivative:
[1 0 -1; 2 0 -2; 1 0 -1]
Sobel Filter:
[[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

Filters We’ve Seen

Comparison of smoothing filters, derivative filters, and Gaussian derivative filters.
Smoothing filters:
Goal: Remove noise
Sums to: 1
Derivative filters:
Goal: Find edges
Sums to: 0

Problems

Still an unsolved problem.

Localizing Reliably

Metaphor for feature distinctiveness.
Guadalupe Street, Intersection of Speedway and Dean Keeton, On campus.

Desirables for Descriptor

Properties of a good descriptor:
Repeatable: Should find the same things even with distortion.
Saliency: Each feature should be distinctive.
Compactness: Shouldn’t just be all the pixels.
Locality: Should only depend on local image data.

Example

Find the correspondences between two images.

Example Matches

Look for the colored squares

Basic Idea

Edge: No change along the edge direction.
Corner: Significant change in all directions.
Flat region: No change in all directions.
Corners should exhibit large intensity changes with any small shift of the window.

Formalizing Corner Detection

Introduction to mathematical formalization.

Formalizing Corner Detection

Zoom-In at x,y, Original Image.

Formalizing Corner Detection

Window at (x, y) and (x+u, y+v).
Measuring Similarity? Zoom-In at x,y.

Formalizing Corner Detection

Error (Sum Sqs) for u,v offset:
E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2

Formalizing Corner Detection

Sum of Squares between @u=-2,v=-3 and unshifted.

Formalizing Corner Detection

Error at u=0, v=0 is always 0. Why?

Match The Location and Plot

Original Image and Zoom-In A, B Error Options

Match The Location and Plot

Original Image and Zoom-In A, B Error Options

Match The Location and Plot

Original Image and Zoom-In A, B Error Options

Match The Location and Plot

Original Image and Zoom-In A, B Error Options

Ok But Back To Math

E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2
Shifting windows around is expensive! We’ll find a trick to approximate this.
Note: only need to get the gist

Aside: Taylor Series for Images

Taylor series for linearizing a function:
f(x + d) \approx f(x) + \frac{\partial f}{\partial x} d
Applying Taylor series to images:
I(x + u, y + v) \approx I(x, y) + Ix u + Iy v
Ix and Iy are the partial derivatives at (x, y).

Formalizing Corner Detection

E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2
\approx \sum{(x, y) \in W} (I[x, y] + Ix u + I_y v - I[x, y])^2
= \sum{(x, y) \in W} (Ix u + I_y v)^2
= \sum{(x, y) \in W} Ix^2 u^2 + 2 Ix Iy u v + I_y^2 v^2

Formalizing Corner Detection

E(u, v) \approx \sum{(x, y) \in W} (Ix^2 u^2 + 2 Ix Iy u v + I_y^2 v^2)
= [u, v] M [u, v]^T
Where M is the second moment matrix.
M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

Intuitively what is M?

M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}
\approx \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix}
a, b both small: flat [0.1 0; 0 0.1]
One big, other small: edge [50 0; 0 0.1] or [0.1 0; 0 50]
a, b both big: corner [50 0; 0 50]

Intuitively what is M?

M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}
Image might be rotated by rotation θ!

Intuitively what is M?

M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix} = V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V
If image rotated by rotation θ / matrix V :
M will look like V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V

So What Now?

Calculate M at pixel by summing nearby gradients but need access to a and b.
Given M, decompose it into eigenvectors V and eigenvalues \lambda1, \lambda2 with.
M = V^{-1} \begin{bmatrix} \lambda1 & 0 \ 0 & \lambda2 \end{bmatrix} V

So What Now?

M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix} = V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V
Compute the quantity R from M:
R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2
Empirical value, usually 0.04-0.06
Easy fast formula
Fast sum the diagonal for 2x2.

So What Now?

R tells us whether we’re at a corner, edge, or flat region.
R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2
If R >> 0: corner \lambda1 \approx \lambda2 >> 0
If R << 0: edge \lambda1 >> \lambda2 >> 0 or \lambda2 >> \lambda1 >> 0
If |R| ≈ 0: flat \lambda1, \lambda2 \approx 0

What Do I Need To Know?

Need to be able to take derivatives of image
Need to be able to compute the entries of M at every pixel.
Should know that some properties of M indicate whether a pixel is a corner or not.
M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

In Practice

Compute partial derivatives Ix, Iy per pixel
Compute M at each pixel, using Gaussian weighting w
M = \begin{bmatrix} \sum{x, y \in W} w(x, y) Ix^2 & \sum{x, y \in W} w(x, y) Ix Iy \ \sum{x, y \in W} w(x, y) Ix Iy & \sum{x, y \in W} w(x, y) Iy^2 \end{bmatrix}

In Practice

Compute partial derivatives Ix, Iy per pixel
Compute M at each pixel, using Gaussian weighting w
Compute response function R
R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2

Computing R

Visual representation of computing R.

Computing R

Visual representation of computing R.

In Practice

Compute partial derivatives Ix, Iy per pixel
Compute M at each pixel, using Gaussian weighting w
Compute response function R
Threshold R

Thresholded R

Visual representation of a thresholded R.

In Practice

Compute partial derivatives Ix, Iy per pixel
Compute M at each pixel, using Gaussian weighting w
Compute response function R
Threshold R
Take only local maxima (called non-maxima suppression)

Thresholded, NMS R

Visual representation of a thresholded, NMS R.

Final Results

Visual representation of final results.