cosc428 final exam

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/72

There's no tags or description

Looks like no tags are added yet.

Last updated 9:46 PM on 6/15/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

73 Terms

New cards

Describe Homography, H

Relates relative pose of 2 cameras viewing a planar scene. Estimate from feature correspondences using RANSAC.

New cards

Describe the Essential Matrix, E

Relates relative pose of 2 cameras viewing a 3D scene. Estimate from feature correspondences using RANSAC.

New cards

Describe Bundle Adjustment, BA

Minimises reprojection error:

Initialise using RANSAC (for E)
Then estimate a set of 3D points and camera poses which minimises reprojection error in a 3D scene

New cards

What are the four steps in a Fiducial Marker Algorithm

Find Marker Outline
Estimate the homography (3DOF orientations/angles) using the corner positions.
Estimate the extrinsics (3DOF location/x,y,z in 3D space) using the focal length and marker size.
Extract the pattern internal to the quad and compare against a database for the marker ID

New cards

How does ArUco find the marker outline?

Adaptive thresholding to binary
Contour Extraction
Quad extraction and threshold on size

New cards

How does AprilTags find the marker outline?

Edge Detection (not binary)
Line segment detection
Quad extraction and threshold on size

New cards

Name the stages of a CNN in the order that they operate from an input image

Input image
Convolutional stage
Non-linear stage
Pooling stage

New cards

Name 3 commonly used deep learning framworks

Torch
Caffe
DeepDream

New cards

(had a deep dream holding a torch in a cafe)

New cards

List eight common image transformations and distortions

Warping
Blur
Translation
Rotation
Partial Occlusion
Illumination
Noise
Radial Distortion

New cards

List the five steps of the natural feature registration algorithm

Find all the feature points in the image
In order to differentiate each feature point, create a descriptor of a local window using a function
Repeat steps 1 & 2 for the source and current frame.
Compare all features in the source to all features in the current frame to find the closest matches.
Use matches to calculate the transormation

New cards

Name three natural feature registration algorithms

Vuforia
OPIRA
Ferns
MetalIO SDK

New cards

How do pixels in the camera differ from the photoreceptors in the human eye - colour space, distribution of colour, sensitivity and resolution.

Cameras use the RGB colour space and have evenly distributed CCD square pixel elements that approximate equal sensitivity to red, green and blue (25% red, 50% green, 25% blue) but with a much lower dynamic range and a much wider spectral resolution (can detect IR and ultraviolet). Cameras have a much higher frame rate.

New cards

Humans resemble the CIE colour space. Foveal 6.5 Mpixel 3 colour camera with a narrow angle lens as well as a peripheral sensitive 100 Mpixel monochrome camera with a wide angle lens. Has a limited spatial resolution of 1-3cm at a distance of 20 metres (with a blind spot. Cognitive processing ends up limiting the 10^8 :1 dynamic range to distinguishing approximately 100 colours and 16-32 shades of black and white. Graph of the colour response of the human photopic vision - red, green and blue cones after correction for absorption by the lens and other inert pigments. Hraph of the spectral response of the human foveal and peripheral vision

New cards

Draw the graph of the colour response of the human photopic vision - red, green and blue cones after correction for absorption by the lens and other inert pigments. Also draw the spectral response of the foveal/photopic vision and the peripheral/scotopic vision.

New cards

Write the formula for the gradient direction of an edge and the gradient strength of an edge in an image

angle = arctan((df/dy)/(df/dx))
magnitude = sqrt((df/dx)^2 + (df/dy)^2)

New cards

Describe how the Canny edge detection algorithm accomplishes Good Detection (filter responds to an edge), Good localisation (detect an edge near a true edge) and single response (one per edge).

Canny is susceptible to noise so it is first convolved with a first order Gaussian filter, the result is a slightly blurred image that is not affected by a single noisy pixel. Edges can occur in a variety of directions so Canny uses four filters to detect horizontal, vertical and diagonal edges using operators such as Sobel or Prewitt which returns the first derivative in each direction. The edge gradient and direction can then be determined where the angle is rounded to one of the four angles representing the vertical, horizontal and diagonal directions.

edge map from blurred image
norm of the gradient
thresholding to respond to edges not noise
thinning for good localization & one response per edge using non-maximum surpression, predict the next edge point and then construct the tangent to the edge curve
hysteresis to improve localization - use a high threshold to start edge curves and a lower threshold to continue them

New cards

How does the choice of Gaussian Kernel size affect the behaviour of the Canny edge detector?

Large sigma detects large scale edges and small sigma detects small scale edges

New cards

Describe how the Harris Detector works.

Harris captures the structure of the local neighbourhood using an Autocorrelation matrix where 2 strong eigenvalues indicate a good local feature, 1 indicates a contour and none indicate a uniform region. Harris gives a measure of the quality of a feature because the best feature points can be thresholded on the eigenvalues.

New cards

Describe how SIFT works

Thresholded image gradients are sampled over a 16x16 array of locations in scale space at 8 different scales
An array of orientation histograms are created at each location creating 128 dimesions (8 orientations x 4x4 histogram)
SIFT is based of vector angles so is computationally efficient

New cards

Compare the SIFT and the Harris Detector

SIFT and Harris are both illumination and rotation invariant because they are based on operators of gradient but they are not deformation invariant because deformations can change gradients. SIFT is scale invariant because it is sampled at different scales but harris is not, they are also both translation invariant for x & y motion perpendicular to the camera.

New cards

Describe how correctly matched points in two images enable finding depth values in a stereo pair of images.

One image is rectified with respect to the other using the essential matrix.
Points lying on a horizontal line in one image are matched with corresponding points on the same line in the other image using least squares of pixel values over a region around each point.
The "x" distance between a matching pair of points is called the disparity, the larger the disparity the closer that point is to the camera based on triangulation.

New cards

Describe how correctly matched points in two images enable finding optical flow in two successive frames of video using the Lucas-Kanade algorithm.

Lucas-Kanade method integrates gradients over a patch to find features good enough to track using the Harris detector.
A constant velocity is assumed for all pixels within an image patch to assist with matching features in successive frames and remove outliers.
Optical flow is the measure of the movement that feature points undergo in successive frames.

New cards

Describe how depth can be calculated from optical flow using a single camera.

Relative depth can be calculated from the velocity of optical flow points (which is larger when the depth is less). So absolute depth can be determined if the camera velocity and pose is known and the intrinsic camera parameters are known.

New cards

What are the advantages and disadvantages of the following fro obtaining depth values:
(a) structured light camera
(b) time-of-flight camera
(c) stereo camera
(d) LiDAR

(a) can provide very accurate results, sensitive to lighting conditions, sensitive to movement, very fast
(b) efficient distance algorithm, sensitive to background lighting, multiple reflections can mean measured distances are inaccurate
(c) complex to set up, computationally expensive, cheap
(d) high accuracy, weather and light independent, generate large datasets that are hard to interpret.

New cards

State the spectral resolution, dynamic range, spatial resolution at 20m and radiometric resolution (shades of colour and grey) of human vision.

400-700nm, 10^8:1, 1-3cm, 16-32 shades of grey and

New cards

How can the Hough Transform be generalised to detect curved lines in an image even when the curve doesnt have a simple analytic form?

Pick a reference point (Xc, Yc) and draw segments from that point to the boundary, measure the length and orientation (r, alpha) of that segment and record the gradient at that point.
form an A accumulator array of possible reference points, scaling factor and rotation angle.
For each pixel in the image compute psi
for each (r, alpha) corresponding to thi(x,y) do
a. Xc - Xi + r(thi)Scos(alpha(thi) + theta)
b. same for Y
c. A(Xc,Yc,S,theta)++
Find the maxima of A

New cards

Describe the following three steps of TextonBoost:

convolution and clustering
boosted texture layout filters
alpha-expansion graph cut

Input image is convolved with a filter bank of 17 filters. Clustering then finds common combinations of filter responses where each pixel has 17 responses. After thousands of iterations of K-means clustering each pixel is assigned to its closest texton.
Recognizes the texture from the textons by using texture layout filters. Texton uses 5000 texture layout filters to distinguish between 21 classes of texture. Each texture is voted for based on the weight of the filter and the number of matching textons, the texture with the most votes will be assigned to that pixel.
a. Image converted to a graph
b. Initial random labelling occurs
c. randomly choose the next class to expand, this is called class alpha
c. perform a graph cut where the two class are the previous class and the class alpha
d. back to step 2 - always converges on local maxima.

New cards

What are the characteristics of a good local feature to track?

Satisfy brightness consistency
Have sufficient texture variation
Not have too much texture variation
Correspond to a "real" surface patch
Not deform too much over time

New cards

A fiducial marker can be used to find the 6-DOF pose of a camera. What are the five steps to finding this pose? Describe each of them.

Fiducial Marker Detection - Steps are taken to locate the marker such as the use of thresholding and edge detection.
Rectangle fitting - Rectangle is fitted to the fiducial marker using contours.
Pattern Checking - the pattern is extracted from the marker and checked against a database.
Lens undistortion - performed using the cameras intrinsic parameters
Pose estimation - estimates the x,y,z co-ordinates of the camera using the focal length and marker size as well as the 3DOF angles using the corner locations.

New cards

Briefly describe each of the four morphological operations and explain what effect they have on an image and why.

Erosion - places the structuring element on the pixel of an object and removes that pixel if the structuring element overlaps a non-object pixel shrinking the object.
Dilation - places the structuring element on an object pixel, if the neighbouring pixels are not part of the object it makes it part of the object - this grows the object.
Opening - A dilation operation followed by erosion. The dilation first fills gaps and holes in the image and then closing maintains the overall object size.
Closing - An erosion operation followed by dilation. Erosion removes noise and breaks narrow lines, the dilation operation then returns the remaining objects to their original size.

New cards

What does it mean to say the Fourier transform is a self-inverting transform?

This means that the function is its own inverse - if you apply it twice you will get back to your original input.

New cards

What does it mean if the centre of a Fourier transform is blacked out? What if only the centre wasn't blacked out?

(a) High-pass filter the inverse fourier transform of this would result in edge detection
(b) Low-pass filter, the inverse fourier transform of this would result in image blur

New cards

Describe the bundle adjustment algorithm.

The essential matrix (E) is initialised using the RANSAC algorithm.
The essential matrix estimates the relative pose of two cameras viewing a 3D scene.
The bundle adjustment algorithm then finds the set of camera positions and 3D points that minimise reprojection error.
Reprojection error is the distance between where a 3D point is projected into an image and where it is measured to lie.
Bundle adjustment does this using a steepest descent algorithm by estimating the gradient of the error with respect to the position and then follows the gradient downhill until the error is within a tolerance.

New cards

What are the features of a good edge detector?

Good Detection: filter responds to an edge not noise
Good Localization: detect edge near true edge
Single Response: One response per edge

New cards

If a virtual 3D model of a body can be kept aligned with a person moving in a 2D image, then it is possible to find all the joint angles for each frame of video, from that 3D model. So describe how three chained homogeneous transformation matrices can project a point b on the ith body part of a virtual 3D model onto a pixel p in a 2D image.

Homography estimates the relative pose of two cameras viewing a planar scene, using correspondences found using RANSAC. Each homogenous transform matrix will represent a plane being x-y, x-z and y-z. Using the 3 Dimensional co-ordinates the three homogenous matrices can map that virtual point onto the image by using the homography matrices for each respective plane.
p(x,b) = I(x,Ci(x,Bi(x,b))) - 3D body part frame of reference to 3D person frame of reference to 3D camera world frame of reference to 2D image.

New cards

Give a strength, weakness and an application for the CIE colour space as well as sketch the space.

Strength: percepptually easy to understand and mix colours.
Weakness: Based on human perception so it is not easily applied to computer vision as it was originally intended for humans to subjectively compare colours
Application: Colour temperature lighting for photographers.

New cards

Give a strength, weakness and an application for the RGB colour space as well as sketch the space.

Strength: Designed to represent the colours used in media such as displays or cameras so is immediately available to computer vision algorithms.
Weakness: Not all colours are perceptually uniform so it is hard to determine colour differences in the RGB colour space. This space is also device specific.
Applications: Computer graphics

New cards

Give a strength, weakness and an application for the HSV colour space as well as sketch the space.

Strength: More useful than RGB for analysing colours for example performing colour range checking.
Weakness: Not perceptually uniform and is device specific.
Application: Used by artists in photoshop, used in light bulbs.

New cards

What is meant by background subtraction, differencing, ghosting and foreground aperture?

Refers to the first frame or some derivative of it being the reference frame.
algorithm usually refers to the difference between two adjacent frames where the previous frame is the reference frame.
Refers to a second image of the moving object as an artifact of a difference algorithm.
Refers to a hole appearing in a moving object as an artifact of a difference algorithm.

New cards

How does the Laplacian of a Gaussian filter sharpen an image?

The Laplacian of a Gaussian filter subtracts the low frequencies (blurred image) from the original image - leaving the high frequencies (edges) remaining as a sharpened image with higher contrast.

New cards

Why does a sharpened image appear to have more content than its original image?

Although there is actually less content from the subtraction of the low frequencies. The accentuated high frequency edges gives the illusion of more content because there appear to be more edges and human perception is sensitive to edges.

New cards

Name five object recognition methods.

Pose clustering
Geometric hashing
Generalised Hough Transform
Template matching
Direction histogram

New cards

What is an advantage and a disadvantage of Pose Clustering?

Robust to partial occlusion
Clutter may lead receive more votes than the target

New cards

What is an advantage and a disadvantage of Geometric Hashing?

Fast but a relatively sparse hash table is critical for good performance.
Difficult to choose the size of the buckets

New cards

What is an advantage and a disadvantage of Generalised Hough Transform?

Computationally efficient for minimal parameters such as straight lines in a noise free image.
Does not scale well to multi-object scenes

New cards

What is an advantage and a disadvantage of Template Matching?

Good for finding small parts of an image which match a template image.
Does not handle partial occlusion well and other variations without a large increase in template numbers.

New cards

What is an advantage and a disadvantage of Direction histogram?

Features are invariant to illumation, 3D rotation and object variation.
Uses many small sub-templates to increase robustness to partial occlusion and other variations

New cards

What are the three main issues in tracking?

Prediction
Data Association
Correction

New cards

Describe the three main issues in tracking: Prediction, Data Association, Correction in the context of the Kalman filter.

Prediction: predict the state of the current frame based of the state in previous frames. The new state is multiplied by a known constant and some zero mean noise is added. Thus, the predicted state is a constant multiplied by the old state.
Data Association: Calculate the state from the current frame considering kinematic models and error minimisation.
Correction: If the measurement error is low (Gaussian Noise), used the measurement state from the current frame. Otherwise place a higher weighting on the predicted state.

New cards

How can an improved and smoothed estimate be made using a Kalman filter?

By running two Kalman filters, one in the forwards direction and one in the backwards direction and then combining the estimates. Here the backwards estimates are used as yet another measurement for the forward filter.

New cards

What are the advantages of a particle filter over a kalman filter?

Predicts multiple positions
It is multi-modal and non-Gaussian

New cards

What is the primary goal of Computer Vision?

To recognise objects and their motion.

New cards

What is the optic nerve? and what limitations does it place on our vision?

The optic nerve is the part of the thalamus that relays signals from the eyes to the visual cortex.
The optic nerve contains approximately 800,000 nerve fibers meaning rods and cones are interconnected giving approximately 10 output channels from the eye to the brain.

New cards

How would you measure a colour using a colour matching paradigm?

Pick a set of three primary colours.
Measure the amount of primary needed to match monochromatic light, t(lambda) at each spectral wavelength lambda.

New cards

How does the Hough transform work?

map a point in hough space to a line in the image (d = xcos(theta) + ysin(theta))

initialize H[d,theta]
for each edge point l[x,y] in the image
for theta = 0 to 180
d = xcos(theta) + ysin(theta)
H[d,thetha] += 1
Find the values of (d, theta) where H[d, theta] is maximum
the detected line in the image is given by d = xcos(theta) + ysin(theta)

New cards

What is the general method for detecting corners?

Filter the image
Compute the magnitude and gradient everywhere
Construct a C window
Use linear algebra to find the two eigenvalues
If they are both big we have a corner

New cards

What is a local invariant feature? List the 5 advantages of local invariant features.

Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale and other imaging parameters.

Locality: robust to occlusion and clutter
Distinctiveness: individual features can be mapped to a large DB of objects.
Quantity: many featuresd can be generated for small objects
Efficiency: close to real time performance
Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

New cards

Name two annoying pheonomena associated with lenses and describe each.

Chromatic Abberation: light at different wavelengths follows different paths, hence some wavelengths are defocused, for machines we can coat the lens.
Scattering at the lens surface: some light entering the lens system is reflected off each surface it encountered (Fresnel's Law), coat the lens interior.

New cards

Define geometric calibration

The relationship between coordinates in the real world and coordinates in the image are found. This can correct for lens distortion, focal length and the image centre.

New cards

Name two segmentation algorithms.

Mean Shfit segmentation: better when only pixel colour is being used.
Graph-cut segmentation: Better when a derived feature is being used, such as texture.

New cards

Outline how K-means clustering works

Choose how many clusters we want and place them randomly (in the case of TextonBoost 400)
Assign each pixel to its closest cluster center.
Move the cluster centers to the mean of their pixels.
Go back to step 2.

New cards

What three other features of TextonBoost help it achieve accurate results?

Location: Sky normally at the top of an image, grass at the bottom. Increase the probability when a texture occurs in its usual locaiton.
Colour: Gaussian mixture model, use K-means clustering to find common colours in a texture and increase the probability when it is its usual colour.
Edges: Decrease the probability if the neighboring pixel is different.

New cards

What are the advantages of passive vision systems (images)?

advantageous as they rely on capturing images
provide an ideal framework for capturing subjects in their natural environment.
advancement in the technology.
Provide a qualitative analysis

New cards

What are the disadvantages to marker tracking?

Requires markers to be worn and set-up procedures are extensive.
Rarely used in clinical or coaching environments
Unable to analyse existing video tapes.

New cards

Describe the following steps in particle filtering: Sample, Predict and Measure

Sample set contains possible alternate values for parameters. When tracking through backroud clutter or occlusion a joint angle may have N possible alternate values with respective weights.
Predict the sample set from the process density, oredict joint angles from the next frame considering the kinematic model, body model and error minimisation.
Measure and weight the new position in terms of the observation density. Estimate, normalize and smooth the weights.

New cards

What is a dyneme?

The smallest contrastive unit of movement. Human movement can be constructed from an alphabet of 35 dynemes

New cards

What is transfer learning?

Basically learning when building on existing knowledge. Can throw away the final classification layer or re-assemble the pre-trained parts to be part of a larger model. We can also only train new parts or use low global learning rates.

New cards

What is the Iterative Closest point algorithm

Used to align two point clouds.
For each point on the source surface find the closest point on the destination surface
Measure the error between clouds using least means squared and calculate the optimal transformation to reduce it.

New cards

How can RANSAC (Random Sampling and Consensus) be used with point clouds?

Allows us to fit models to the data (cylinders, planes…), picks the minimum number of points to define the model, then tests the model against the remaining points to verify it. Left with the model parameters, inliers and outliers.

New cards

What are some common incorrect matches that occur during feature matching?

Self-similar environments: objects look the same but are different.
Corner features on occlusion boundaires which do not correspond to actual 3D points.
3.

New cards

Why are homogenous coordinates used in feature tracking?

1 extra dimension is added, this means both rotation and translation can be represented using matrix multiplication and projection by normalization.

New cards

How does RANSAC work?

randomly choose the minimum number of datapoints needed to generate a model
Hypothesise: compute a model from the hypothesis set, if this model only contains inliers then the model will be correct.
Test: count how many datapoints would be inliers if the model was correct.
Repeat until amodel compatible with a large number of data points is found, these datapoints are inliers.