1/72
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Describe Homography, H
Relates relative pose of 2 cameras viewing a planar scene. Estimate from feature correspondences using RANSAC.
Describe the Essential Matrix, E
Relates relative pose of 2 cameras viewing a 3D scene. Estimate from feature correspondences using RANSAC.
Describe Bundle Adjustment, BA
Minimises reprojection error:
What are the four steps in a Fiducial Marker Algorithm
How does ArUco find the marker outline?
How does AprilTags find the marker outline?
Name the stages of a CNN in the order that they operate from an input image
Name 3 commonly used deep learning framworks
(had a deep dream holding a torch in a cafe)
List eight common image transformations and distortions
List the five steps of the natural feature registration algorithm
Name three natural feature registration algorithms
How do pixels in the camera differ from the photoreceptors in the human eye - colour space, distribution of colour, sensitivity and resolution.
Cameras use the RGB colour space and have evenly distributed CCD square pixel elements that approximate equal sensitivity to red, green and blue (25% red, 50% green, 25% blue) but with a much lower dynamic range and a much wider spectral resolution (can detect IR and ultraviolet). Cameras have a much higher frame rate.
Humans resemble the CIE colour space. Foveal 6.5 Mpixel 3 colour camera with a narrow angle lens as well as a peripheral sensitive 100 Mpixel monochrome camera with a wide angle lens. Has a limited spatial resolution of 1-3cm at a distance of 20 metres (with a blind spot. Cognitive processing ends up limiting the 10^8 :1 dynamic range to distinguishing approximately 100 colours and 16-32 shades of black and white. Graph of the colour response of the human photopic vision - red, green and blue cones after correction for absorption by the lens and other inert pigments. Hraph of the spectral response of the human foveal and peripheral vision
Draw the graph of the colour response of the human photopic vision - red, green and blue cones after correction for absorption by the lens and other inert pigments. Also draw the spectral response of the foveal/photopic vision and the peripheral/scotopic vision.
Write the formula for the gradient direction of an edge and the gradient strength of an edge in an image
angle = arctan((df/dy)/(df/dx))
magnitude = sqrt((df/dx)^2 + (df/dy)^2)
Describe how the Canny edge detection algorithm accomplishes Good Detection (filter responds to an edge), Good localisation (detect an edge near a true edge) and single response (one per edge).
Canny is susceptible to noise so it is first convolved with a first order Gaussian filter, the result is a slightly blurred image that is not affected by a single noisy pixel. Edges can occur in a variety of directions so Canny uses four filters to detect horizontal, vertical and diagonal edges using operators such as Sobel or Prewitt which returns the first derivative in each direction. The edge gradient and direction can then be determined where the angle is rounded to one of the four angles representing the vertical, horizontal and diagonal directions.
How does the choice of Gaussian Kernel size affect the behaviour of the Canny edge detector?
Large sigma detects large scale edges and small sigma detects small scale edges
Describe how the Harris Detector works.
Harris captures the structure of the local neighbourhood using an Autocorrelation matrix where 2 strong eigenvalues indicate a good local feature, 1 indicates a contour and none indicate a uniform region. Harris gives a measure of the quality of a feature because the best feature points can be thresholded on the eigenvalues.
Describe how SIFT works
Compare the SIFT and the Harris Detector
SIFT and Harris are both illumination and rotation invariant because they are based on operators of gradient but they are not deformation invariant because deformations can change gradients. SIFT is scale invariant because it is sampled at different scales but harris is not, they are also both translation invariant for x & y motion perpendicular to the camera.
Describe how correctly matched points in two images enable finding depth values in a stereo pair of images.
Describe how correctly matched points in two images enable finding optical flow in two successive frames of video using the Lucas-Kanade algorithm.
Describe how depth can be calculated from optical flow using a single camera.
Relative depth can be calculated from the velocity of optical flow points (which is larger when the depth is less). So absolute depth can be determined if the camera velocity and pose is known and the intrinsic camera parameters are known.
What are the advantages and disadvantages of the following fro obtaining depth values:
(a) structured light camera
(b) time-of-flight camera
(c) stereo camera
(d) LiDAR
(a) can provide very accurate results, sensitive to lighting conditions, sensitive to movement, very fast
(b) efficient distance algorithm, sensitive to background lighting, multiple reflections can mean measured distances are inaccurate
(c) complex to set up, computationally expensive, cheap
(d) high accuracy, weather and light independent, generate large datasets that are hard to interpret.
State the spectral resolution, dynamic range, spatial resolution at 20m and radiometric resolution (shades of colour and grey) of human vision.
400-700nm, 10^8:1, 1-3cm, 16-32 shades of grey and
How can the Hough Transform be generalised to detect curved lines in an image even when the curve doesnt have a simple analytic form?
Describe the following three steps of TextonBoost:
What are the characteristics of a good local feature to track?
A fiducial marker can be used to find the 6-DOF pose of a camera. What are the five steps to finding this pose? Describe each of them.
Briefly describe each of the four morphological operations and explain what effect they have on an image and why.
Erosion - places the structuring element on the pixel of an object and removes that pixel if the structuring element overlaps a non-object pixel shrinking the object.
Dilation - places the structuring element on an object pixel, if the neighbouring pixels are not part of the object it makes it part of the object - this grows the object.
Opening - A dilation operation followed by erosion. The dilation first fills gaps and holes in the image and then closing maintains the overall object size.
Closing - An erosion operation followed by dilation. Erosion removes noise and breaks narrow lines, the dilation operation then returns the remaining objects to their original size.
What does it mean to say the Fourier transform is a self-inverting transform?
This means that the function is its own inverse - if you apply it twice you will get back to your original input.
What does it mean if the centre of a Fourier transform is blacked out? What if only the centre wasn't blacked out?
(a) High-pass filter the inverse fourier transform of this would result in edge detection
(b) Low-pass filter, the inverse fourier transform of this would result in image blur
Describe the bundle adjustment algorithm.
What are the features of a good edge detector?
If a virtual 3D model of a body can be kept aligned with a person moving in a 2D image, then it is possible to find all the joint angles for each frame of video, from that 3D model. So describe how three chained homogeneous transformation matrices can project a point b on the ith body part of a virtual 3D model onto a pixel p in a 2D image.
Homography estimates the relative pose of two cameras viewing a planar scene, using correspondences found using RANSAC. Each homogenous transform matrix will represent a plane being x-y, x-z and y-z. Using the 3 Dimensional co-ordinates the three homogenous matrices can map that virtual point onto the image by using the homography matrices for each respective plane.
p(x,b) = I(x,Ci(x,Bi(x,b))) - 3D body part frame of reference to 3D person frame of reference to 3D camera world frame of reference to 2D image.
Give a strength, weakness and an application for the CIE colour space as well as sketch the space.
Strength: percepptually easy to understand and mix colours.
Weakness: Based on human perception so it is not easily applied to computer vision as it was originally intended for humans to subjectively compare colours
Application: Colour temperature lighting for photographers.
Give a strength, weakness and an application for the RGB colour space as well as sketch the space.
Strength: Designed to represent the colours used in media such as displays or cameras so is immediately available to computer vision algorithms.
Weakness: Not all colours are perceptually uniform so it is hard to determine colour differences in the RGB colour space. This space is also device specific.
Applications: Computer graphics
Give a strength, weakness and an application for the HSV colour space as well as sketch the space.
Strength: More useful than RGB for analysing colours for example performing colour range checking.
Weakness: Not perceptually uniform and is device specific.
Application: Used by artists in photoshop, used in light bulbs.
What is meant by background subtraction, differencing, ghosting and foreground aperture?
How does the Laplacian of a Gaussian filter sharpen an image?
The Laplacian of a Gaussian filter subtracts the low frequencies (blurred image) from the original image - leaving the high frequencies (edges) remaining as a sharpened image with higher contrast.
Why does a sharpened image appear to have more content than its original image?
Although there is actually less content from the subtraction of the low frequencies. The accentuated high frequency edges gives the illusion of more content because there appear to be more edges and human perception is sensitive to edges.
Name five object recognition methods.
Pose clustering
Geometric hashing
Generalised Hough Transform
Template matching
Direction histogram
What is an advantage and a disadvantage of Pose Clustering?
What is an advantage and a disadvantage of Geometric Hashing?
What is an advantage and a disadvantage of Generalised Hough Transform?
What is an advantage and a disadvantage of Template Matching?
What is an advantage and a disadvantage of Direction histogram?
What are the three main issues in tracking?
Describe the three main issues in tracking: Prediction, Data Association, Correction in the context of the Kalman filter.
Prediction: predict the state of the current frame based of the state in previous frames. The new state is multiplied by a known constant and some zero mean noise is added. Thus, the predicted state is a constant multiplied by the old state.
Data Association: Calculate the state from the current frame considering kinematic models and error minimisation.
Correction: If the measurement error is low (Gaussian Noise), used the measurement state from the current frame. Otherwise place a higher weighting on the predicted state.
How can an improved and smoothed estimate be made using a Kalman filter?
By running two Kalman filters, one in the forwards direction and one in the backwards direction and then combining the estimates. Here the backwards estimates are used as yet another measurement for the forward filter.
What are the advantages of a particle filter over a kalman filter?
What is the primary goal of Computer Vision?
To recognise objects and their motion.
What is the optic nerve? and what limitations does it place on our vision?
The optic nerve is the part of the thalamus that relays signals from the eyes to the visual cortex.
The optic nerve contains approximately 800,000 nerve fibers meaning rods and cones are interconnected giving approximately 10 output channels from the eye to the brain.
How would you measure a colour using a colour matching paradigm?
How does the Hough transform work?
map a point in hough space to a line in the image (d = xcos(theta) + ysin(theta))
What is the general method for detecting corners?
What is a local invariant feature? List the 5 advantages of local invariant features.
Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale and other imaging parameters.
Name two annoying pheonomena associated with lenses and describe each.
Chromatic Abberation: light at different wavelengths follows different paths, hence some wavelengths are defocused, for machines we can coat the lens.
Scattering at the lens surface: some light entering the lens system is reflected off each surface it encountered (Fresnel's Law), coat the lens interior.
Define geometric calibration
The relationship between coordinates in the real world and coordinates in the image are found. This can correct for lens distortion, focal length and the image centre.
Name two segmentation algorithms.
Mean Shfit segmentation: better when only pixel colour is being used.
Graph-cut segmentation: Better when a derived feature is being used, such as texture.
Outline how K-means clustering works
What three other features of TextonBoost help it achieve accurate results?
Location: Sky normally at the top of an image, grass at the bottom. Increase the probability when a texture occurs in its usual locaiton.
Colour: Gaussian mixture model, use K-means clustering to find common colours in a texture and increase the probability when it is its usual colour.
Edges: Decrease the probability if the neighboring pixel is different.
What are the advantages of passive vision systems (images)?
What are the disadvantages to marker tracking?
Describe the following steps in particle filtering: Sample, Predict and Measure
What is a dyneme?
The smallest contrastive unit of movement. Human movement can be constructed from an alphabet of 35 dynemes
What is transfer learning?
Basically learning when building on existing knowledge. Can throw away the final classification layer or re-assemble the pre-trained parts to be part of a larger model. We can also only train new parts or use low global learning rates.
What is the Iterative Closest point algorithm
How can RANSAC (Random Sampling and Consensus) be used with point clouds?
Allows us to fit models to the data (cylinders, planes…), picks the minimum number of points to define the model, then tests the model against the remaining points to verify it. Left with the model parameters, inliers and outliers.
What are some common incorrect matches that occur during feature matching?
Why are homogenous coordinates used in feature tracking?
1 extra dimension is added, this means both rotation and translation can be represented using matrix multiplication and projection by normalization.
How does RANSAC work?