Detectors and Descriptors

Applications To Have In Mind

  • Core computer vision applications.

  • Image alignment: Determining if different images contain parts of the same photo or object viewed from another angle.

Applications To Have In Mind

  • Building 3D reconstructions from multiple images.

Applications To Have In Mind

  • Stitching photos taken at different angles to create a panorama.

One Example: Image Alignment

  • Aligning two given images.

One Solution

  • Iterate through possible translations to find the best alignment.

  • Check alignment with images for all pixels within a search range:

  • python for y in range(-ySearch, ySearch+1): for x in range(-xSearch, xSearch+1): check_alignment_with_images()

One Motivating Example

  • Aligning images with 3D rotation and translation.

An Alternate Approach

  • Find corners and features in the images.

  • Match these features based on local image data.

What Now?

  • Given pairs p1, p2 of correspondence, how do I align?

  • Consider translation-only case.

An Alternate Approach

  • Solve for transformation T (e.g., such that p1 ≡ T p2) that fits the matches well.

  • Homogeneous coordinates are used.

An Alternate Approach

  • Blend the aligned images together.

  • Key insight: Operate on parts of the image rather than the entire image.

Today: Finding Edges and Corners

  • Focus on finding edges and corners in images.

  • Edges often occur at object boundaries or changes in surface properties.

Where do Edges Come From?

  • Edges can arise from depth/distance discontinuities.

Where do Edges Come From?

  • Edges can arise from surface normal / orientation discontinuities.

Where do Edges Come From?

  • Edges can arise from surface color / reflectance properties discontinuities.

Where do Edges Come From?

  • Edges can arise from illumination discontinuities.

Derivatives

  • Illustrates derivative filters lx and ly.

  • l_x filter: [-1, 0, 1] (horizontal derivative)

  • l_y filter: [-1, 0, 1]^T (vertical derivative)

Derivatives

  • Derivative: Rate of change of a function f(x) at a point, indicating the direction of increase.

  • Gradient: Vector of all partial derivatives.

What Should I Know?

  • Gradients are partial derivatives per dimension.

  • For an n-dimensional input, the gradient also has n dimensions.

  • Gradients point in the direction of ascent and indicate the rate of ascent.

  • If a is a minimum of f(x), then \nabla f(a) = 0.

  • The reverse is not always true, especially in high-dimensional spaces.

Derivatives

  • Illustrates derivative filters lx and ly.

  • l_x filter: [-1, 0, 1] (horizontal derivative)

  • l_y filter: [-1, 0, 1]^T (vertical derivative)

Why Does This Work?

  • Definition of a partial derivative:

  • \frac{\partial f(x, y)}{\partial x} = \lim_{\epsilon \to 0} \frac{f(x + \epsilon, y) - f(x, y)}{\epsilon}

  • Approximation using a finite difference:

  • \frac{\partial f(x, y)}{\partial x} \approx f(x + 1, y) - f(x, y)

  • Alternative approximation:

  • \frac{\partial f(x, y)}{\partial x} \approx \frac{f(x + 1, y) - f(x - 1, y)}{2}

Other Differentiation Operations

  • Prewitt and Sobel operators are used for edge detection.

  • Prewitt filters:

  • Horizontal: [-1, 0, 1; -1, 0, 1; -1, 0, 1]

  • Vertical: [[1, 1, 1], [0, 0, 0], [-1, -1, -1]]

  • Sobel filters:

  • Horizontal: [-1, 0, 1; -2, 0, 2; -1, 0, 1]

  • Vertical: [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

Images as Functions or Points

  • Treating an image as a point in R^{H \times W} or as a function of x, y.

  • Image gradient:

  • \nabla I(x, y) = \begin{bmatrix} \frac{\partial I}{\partial x}(x, y) \ \frac{\partial I}{\partial y}(x, y) \end{bmatrix}

  • \frac{\partial I}{\partial x}(x, y) (often called Ix) represents the horizontal change in intensity at (x, y).

Image Gradient Direction

  • Examples of gradient directions.

  • \nabla f = [\frac{\partial f}{\partial x}, 0]

  • \nabla f = [0, \frac{\partial f}{\partial y}]

  • \nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}]

Image Gradient

  • Gradient: direction of maximum change.

  • Relationship to edge direction.

Image Gradient

  • Magnitude of the gradient: \sqrt{lx^2 + ly^2}

Image Gradient

  • Orientation of the gradient: atan2(ly, lx)

  • Lightness equal to gradient magnitude.

Image Gradient

  • Orientation of the gradient: atan2(ly, lx)

Image Gradient

  • Orientation of the gradient: atan2(ly, lx)

  • Discusses the structure observed in gradient orientations.

Noise

  • Effect of noise on a row of an image f(x, y).

Noise

  • Convolution with a derivative filter amplifies noise.

  • D{i,j} = (I{i,j+1} + \epsilon{i,j+1}) - (I{i,j-1} + \epsilon_{i,j-1})

  • Where I{i,j} is the true image, and \epsilon{i,j} \sim N(0, \sigma^2).

  • D{i,j} = (I{i,j+1} - I{i,j-1}) + \epsilon{i,j+1} - \epsilon_{i,j-1}

  • \epsilon{i,j} - \epsilon{k,l} \sim N(0, 2\sigma^2): Variance doubles.

Noise

  • Consider a row of f(x, y) (i.e., make y constant).

  • How can we use the last class to fix this?

Handling Noise

  • Convolution with a Gaussian kernel followed by differentiation.

  • \frac{d}{dx}(f * g)

Noisy Input

  • Noise in 2D, visualized with a zoomed-in meme image.

  • Ix calculated via [-1, 0, 1] filter.

Noise + Smoothing

  • Smoothed input image, then applying Ix via [-1, 0, 1].

Let’s Make It One Pass (1D)

  • Differentiation of a convolution is equivalent to convolving with the derivative of the kernel.

  • \frac{d}{dx}(f * g) = f * \frac{d}{dx}g

Let's Make It One Pass (2D)

  • Gaussian Derivative Filter

  • Which one finds horizontal edges? (The one that is differentiating in the y direction)

Applying the Gaussian Derivative

  • Applying Gaussian derivative filters with different pixel sizes (1, 3, 7 pixels).

  • Removes noise but blurs edges.

Compared with the Past

  • Comparing Gaussian derivative filter with Sobel filter.

  • Gaussian Derivative:

  • [1 0 -1; 2 0 -2; 1 0 -1]

  • Sobel Filter:

  • [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

Filters We’ve Seen

  • Comparison of smoothing filters, derivative filters, and Gaussian derivative filters.

  • Smoothing filters:

  • Goal: Remove noise

  • Sums to: 1

  • Derivative filters:

  • Goal: Find edges

  • Sums to: 0

Problems

  • Still an unsolved problem.

Localizing Reliably

  • Metaphor for feature distinctiveness.

  • Guadalupe Street, Intersection of Speedway and Dean Keeton, On campus.

Desirables for Descriptor

  • Properties of a good descriptor:

  • Repeatable: Should find the same things even with distortion.

  • Saliency: Each feature should be distinctive.

  • Compactness: Shouldn’t just be all the pixels.

  • Locality: Should only depend on local image data.

Example

  • Find the correspondences between two images.

Example Matches

  • Look for the colored squares

Basic Idea

  • Edge: No change along the edge direction.

  • Corner: Significant change in all directions.

  • Flat region: No change in all directions.

  • Corners should exhibit large intensity changes with any small shift of the window.

Formalizing Corner Detection

  • Introduction to mathematical formalization.

Formalizing Corner Detection

  • Zoom-In at x,y, Original Image.

Formalizing Corner Detection

  • Window at (x, y) and (x+u, y+v).

  • Measuring Similarity? Zoom-In at x,y.

Formalizing Corner Detection

  • Error (Sum Sqs) for u,v offset:

  • E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2

Formalizing Corner Detection

  • Sum of Squares between @u=-2,v=-3 and unshifted.

Formalizing Corner Detection

  • Error at u=0, v=0 is always 0. Why?

Match The Location and Plot

  • Original Image and Zoom-In A, B Error Options

Match The Location and Plot

  • Original Image and Zoom-In A, B Error Options

Match The Location and Plot

  • Original Image and Zoom-In A, B Error Options

Match The Location and Plot

  • Original Image and Zoom-In A, B Error Options

Ok But Back To Math

  • E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2

  • Shifting windows around is expensive! We’ll find a trick to approximate this.

  • Note: only need to get the gist

Aside: Taylor Series for Images

  • Taylor series for linearizing a function:

  • f(x + d) \approx f(x) + \frac{\partial f}{\partial x} d

  • Applying Taylor series to images:

  • I(x + u, y + v) \approx I(x, y) + Ix u + Iy v

  • Ix and Iy are the partial derivatives at (x, y).

Formalizing Corner Detection

  • E(u, v) = \sum_{(x, y) \in W} (I[x + u, y + v] - I[x, y])^2

  • \approx \sum{(x, y) \in W} (I[x, y] + Ix u + I_y v - I[x, y])^2

  • = \sum{(x, y) \in W} (Ix u + I_y v)^2

  • = \sum{(x, y) \in W} Ix^2 u^2 + 2 Ix Iy u v + I_y^2 v^2

Formalizing Corner Detection

  • E(u, v) \approx \sum{(x, y) \in W} (Ix^2 u^2 + 2 Ix Iy u v + I_y^2 v^2)

  • = [u, v] M [u, v]^T

  • Where M is the second moment matrix.

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

Intuitively what is M?

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

  • \approx \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix}

  • a, b both small: flat [0.1 0; 0 0.1]

  • One big, other small: edge [50 0; 0 0.1] or [0.1 0; 0 50]

  • a, b both big: corner [50 0; 0 50]

Intuitively what is M?

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

  • Image might be rotated by rotation θ!

Intuitively what is M?

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix} = V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V

  • If image rotated by rotation θ / matrix V :

  • M will look like V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V

So What Now?

  • Calculate M at pixel by summing nearby gradients but need access to a and b.

  • Given M, decompose it into eigenvectors V and eigenvalues \lambda1, \lambda2 with.

  • M = V^{-1} \begin{bmatrix} \lambda1 & 0 \ 0 & \lambda2 \end{bmatrix} V

So What Now?

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix} = V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V

  • Compute the quantity R from M:

  • R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2

  • Empirical value, usually 0.04-0.06

  • Easy fast formula

  • Fast sum the diagonal for 2x2.

So What Now?

  • R tells us whether we’re at a corner, edge, or flat region.

  • R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2

  • If R >> 0: corner \lambda1 \approx \lambda2 >> 0

  • If R << 0: edge \lambda1 >> \lambda2 >> 0 or \lambda2 >> \lambda1 >> 0

  • If |R| ≈ 0: flat \lambda1, \lambda2 \approx 0

What Do I Need To Know?

  • Need to be able to take derivatives of image

  • Need to be able to compute the entries of M at every pixel.

  • Should know that some properties of M indicate whether a pixel is a corner or not.

  • M = \begin{bmatrix} \sum{x, y \in W} Ix^2 & \sum{x, y \in W} Ix Iy \ \sum{x, y \in W} Ix Iy & \sum{x, y \in W} Iy^2 \end{bmatrix}

In Practice

  • Compute partial derivatives Ix, Iy per pixel

  • Compute M at each pixel, using Gaussian weighting w

  • M = \begin{bmatrix} \sum{x, y \in W} w(x, y) Ix^2 & \sum{x, y \in W} w(x, y) Ix Iy \ \sum{x, y \in W} w(x, y) Ix Iy & \sum{x, y \in W} w(x, y) Iy^2 \end{bmatrix}

In Practice

  • Compute partial derivatives Ix, Iy per pixel

  • Compute M at each pixel, using Gaussian weighting w

  • Compute response function R

  • R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2

Computing R

  • Visual representation of computing R.

Computing R

  • Visual representation of computing R.

In Practice

  • Compute partial derivatives Ix, Iy per pixel

  • Compute M at each pixel, using Gaussian weighting w

  • Compute response function R

  • Threshold R

Thresholded R

  • Visual representation of a thresholded R.

In Practice

  • Compute partial derivatives Ix, Iy per pixel

  • Compute M at each pixel, using Gaussian weighting w

  • Compute response function R

  • Threshold R

  • Take only local maxima (called non-maxima suppression)

Thresholded, NMS R

  • Visual representation of a thresholded, NMS R.

Final Results

  • Visual representation of final results.