Detectors and Descriptors

Detectors and Descriptors Notes

Administrivia

  • HW2 is out and students are encouraged to start early.

Last Time – Gradients

  • Image gradients treat the image as a function of x and y. \nabla f = [\frac{\partial f}{\partial x}, 0], \nabla f = [0, \frac{\partial f}{\partial y}], \nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}]
  • Image gradients help find edges and corners.

Filters

  • Smoothing Filters remove noise and typically sum to 1.
  • Derivative Filters find edges and typically sum to 0.

Motivation

  • The goal is to find corners and features in images.

Desirables for Descriptor

  • Repeatable: Should find the same features even with distortion.
  • Saliency: Each feature should be distinctive.
  • Compactness: Shouldn’t just be all the pixels.
  • Locality: Should only depend on local image data.

Basic Idea

  • Edge: No change along the edge direction.
  • Corner: Significant change in all directions.
  • Flat Region: No change in all directions.
  • A significant intensity change should be visible with a small window or any shift.

Formalizing Corner Detection

  • Error (Sum of Squares) for u,v offset is given by the equation: E(u, v) = \sum_{(x,y) \in W} (I[x + u, y + v] - I[x, y])^2

Taylor Series

  • Taylor Series is used to linearize a function: f(x + d) \approx f(x) + \frac{\partial f}{\partial x} d
  • Applying Taylor series to images, treating them as functions of x and y: I(x + u, y + v) \approx I(x, y) + Ix u + Iy v where Ix is the partial derivative of I with respect to x, and Iy is the partial derivative of I with respect to y, both evaluated at point (x, y).

Formalizing Corner Detection with Taylor Series

  • Using Taylor series expansion to approximate the error function:
    • E(u, v) = \sum{(x,y) \in W} (I[x + u, y + v] - I[x, y])^2 \approx \sum{(x,y) \in W} (I[x, y] + Ix u + Iy v - I[x, y])^2
    • = \sum{(x,y) \in W} (Ix u + I_y v)^2
    • = \sum{(x,y) \in W} Ix^2 u^2 + 2Ix Iy uv + I_y^2 v^2
  • By linearizing the image, E(u, v) can be approximated with a quadratic function of u and v: E(u, v) \approx [u, v] M [u, v]^T
  • Where M is the second moment matrix:
    • M = \begin{bmatrix} \sum{x,y \in W} Ix^2 & \sum{x,y \in W} Ix Iy \ \sum{x,y \in W} Ix Iy & \sum{x,y \in W} Iy^2 \end{bmatrix}

Intuitively What is M?

  • Approximation:
    • M \approx \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix}
    • gradients are either vertical or horizontal at a pixel (so Ix Iy = 0)
  • Cases:
    • a, b both small: flat
    • one big, other small: edge
    • a, b both big: corner
  • If the image is rotated by a rotation \theta / matrix V, M will look like V^{-1} \begin{bmatrix} a & 0 \ 0 & b \end{bmatrix} V

Eigenvalues and M Matrix

  • Given M, can decompose it into eigenvectors V and eigenvalues \lambda1, \lambda2 with
    • M = V^{-1} \begin{bmatrix} \lambda1 & 0 \ 0 & \lambda2 \end{bmatrix} V
  • Compute quantity R from M
    • R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2
    • \alpha is an empirical value, usually 0.04-0.06.

R and Corner/Edge/Flat Detection

  • R tells us whether we’re at a corner, edge, or flat region:
    • R >> 0: corner (\lambda1 \approx \lambda2 \gg 0)
    • R << 0: edge (\lambda1 \gg \lambda2 \gg 0 or \lambda2 \gg \lambda1 \gg 0)
    • |R| ≈ 0: flat (\lambda1, \lambda2 \approx 0)

What You Need To Know

  • Need to be able to take derivatives of the image.
  • Need to be able to compute the entries of M at every pixel.
  • Should know that some properties of M indicate whether a pixel is a corner or not.

In Practice: Harris Corner Detection

  1. Compute partial derivatives Ix, Iy per pixel.
  2. Compute M at each pixel, using Gaussian weighting w:
    • M = \sum{x,y \in W} w(x, y) \begin{bmatrix} Ix^2 & Ix Iy \ Ix Iy & I_y^2 \end{bmatrix}
  3. Compute response function R:
    • R = det(M) - \alpha trace(M)^2 = \lambda1 \lambda2 - \alpha (\lambda1 + \lambda2)^2
  4. Threshold R.
  5. Take only local maxima (non-maxima suppression).

Desirable Properties

  • If detectors are repeatable, they should be:
    • Invariant to some things: image is transformed and corners remain the same.
    • Equivariant with some things: image is transformed and corners transform with it.

Affine Intensity Change

  • Partially invariant to affine intensity changes: I{new} = aI{old} + b
  • M only depends on derivatives, so b is irrelevant.
  • But a scales derivatives and there’s a threshold.

Image Translation

  • Equivariant with translation because convolution is translation equivariant.

Image Scaling

  • Not equivariant with scaling.

Key Idea: Scale Space

  • Try different scales to handle image scaling.

Blob Detection

  • Find maxima and minima of blob filter response in scale and space.

Gaussian Derivatives

  • First Derivative
    • \frac{\partial}{\partial x} g
    • \frac{\partial}{\partial y} g
  • Second Derivative
    • \frac{\partial^2}{\partial x^2} g
    • \frac{\partial^2}{\partial y^2} g

Laplacian of Gaussian (LoG)

  • \nabla^2 = \frac{\partial^2}{\partial x^2} g + \frac{\partial^2}{\partial y^2} g
  • Scale the Laplacian of Gaussian if you want to compare across sigmas:
    • \nabla_{norm}^2 = \sigma^2 (\frac{\partial^2}{\partial x^2} g + \frac{\partial^2}{\partial y^2} g)

Edge Detection with LoG

  • Edge = Zero-crossing
  • f * \frac{\partial^2}{\partial x^2} g

Scale Selection

  • Characteristic scale of a blob is the scale that produces the maximum response.

Scale-space blob detector

  1. Convolve image with scale-normalized Laplacian at several scales.
  2. Find maxima of squared Laplacian response in scale-space.

Finding Maxima

  • Point i,j is maxima (minima if you flip sign) in image I if it’s bigger than all neighbors.
    • for y in range(i-1, i+1+1): for x in range(j-1, j+1+1): if y == i and x == j: continue I[y, x] < I[i, j]

Scale Space

  • Blue lines are image-space neighbors.
  • Red lines are the scale-space neighbors.

Finding Maxima in Scale Space

  • Suppose I[:,:,k] is image at scale k. Point i,j,k is maxima (minima if you flip sign) in image I if:
    • for y in range(i-1, i+1+1): for x in range(j-1, j+1+1): for c in range(k-1, k+1+1): if y == i and x == j and c == k: continue I[y, x, c] < I[i, j, k]

Approximating the Laplacian with a Difference of Gaussians

  • (Laplacian) \approx (Difference \ of \ Gaussians)
  • (\frac{\partial^2}{\partial x^2}g + \frac{\partial^2}{\partial y^2}g) \approx (\frac{1}{\sqrt{2\pi\sigma1^2}}e^{-\frac{x^2+y^2}{2\sigma1^2}} - \frac{1}{\sqrt{2\pi\sigma2^2}}e^{-\frac{x^2+y^2}{2\sigma2^2}})

SIFT (Scale-Invariant Feature Transform)

Problem 1: How do we deal with scales? Try them all!

  • Why is this efficient? 1 + 1/4 + 1/16 + 1/64 + 1/4^i … = 4/3 Vast majority of effort is in the first and second scales

Problem 2 – Describing Features

  • Once we’ve found a corner/blobs, we can’t just use the image nearby. What about:
    1. Scale?
    2. Rotation?
    3. Additive light?

Handling Scale

  • Given characteristic scale (maximum Laplacian response), we can just rescale image.

Handling Rotation

  • Given window, can compute “dominant orientation” and then rotate image.

SIFT Features

  • SIFT features at characteristic scales and dominant orientations.
    Rotate and set to common scale

SIFT Descriptors

  1. Compute gradients.
  2. Build histogram (4x4 in practice).
  • Gradients ignore global illumination changes.
  • Gaussian weighting: smooth response
  • Normalization: reduces illumination effects
  • Clamping
  • Tons of more stuff

Properties of SIFT

  • Can handle: up to ~60 degree out-of-plane rotation, changes of illumination
  • Fast, efficient, code available (but was patented)

Feature Descriptors

  • 128D vector x
  • Think of feature as some non-linear filter that maps pixels to 128D feature

2nd Nearest Neighbor Trick

  • Given a feature xq, nearest neighbor to x is a good match, but distances can’t be thresholded.
  • Instead, find nearest neighbor (x1NN) and second nearest neighbor (x2NN). This ratio is a good test for matches:
    • r = \frac{xq - x{1NN}}{xq - x{2NN}}