1/99
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
RGB colour space — model
A cube with orthogonal Red, Green, Blue axes; black at origin, white opposite, greys on the diagonal; colours combine additively.
RGB — strengths
Matches capture/display hardware (sensors and monitor sub-pixels are RGB); simple; native image storage format.
RGB — weaknesses
Not perceptually uniform; channels highly correlated so lighting changes shift all three; unintuitive to specify a colour.
RGB — typical use
Image acquisition, storage and display.
HSV colour space — model
Cylinder/cone: Hue = angle (the colour), Saturation = radius (vividness), Value = height (brightness).
HSV — key strength
Separates chromaticity from intensity, so hue is robust to lighting changes — good for colour segmentation.
HSV — weaknesses
Singularities (hue undefined when unsaturated, S undefined at V=0); hue wraps at 0/360°; not perceptually uniform.
HSV — typical use
Colour pickers, image editing, colour-based object segmentation.
CIE colour space — basis
Device-independent, derived from human colour-matching experiments (CIE 1931 XYZ; CIELAB perceptually uniform).
CIE chromaticity diagram
The horseshoe of all visible colours; spectral colours on the curved edge; device gamuts are regions inside it.
CIELAB
L* lightness, a* green↔red, b* blue↔yellow; engineered so Euclidean distance ≈ perceived colour difference (ΔE).
CIE — strengths
Device-independent standard; spans all visible colour; perceptually uniform (Lab); basis of colour management.
CIE — weaknesses
Not directly displayable; abstract; needs conversion; computationally heavier.
Colour gamut
The set of colours a device can reproduce — a triangle/region inside the CIE horseshoe; differs per device.
One-line colour summary
RGB matches hardware, HSV matches human intuition (robust segmentation), CIE matches the science (device-independent, uniform).
Bayer filter
A camera mosaic: 50% green, 25% red, 25% blue photosites; full RGB per pixel is interpolated (demosaiced).
Why extra green in Bayer
Mimics the human eye's greater luminance sensitivity in the green region.
Cones
Retinal colour receptors (L/M/S ≈ R/G/B) active in good light (photopic); concentrated in the fovea.
Rods
Retinal monochrome receptors; very light-sensitive (scotopic), good for motion/low light; dominate the periphery; no colour.
Fovea
Central retinal region packed with cones; gives sharp, colour, high-acuity vision.
Blind spot
Region with no receptors where the optic nerve exits the retina.
Camera colour distribution
Perfectly regular, uniform grid (Bayer pattern repeated identically everywhere).
Retina colour distribution
Non-uniform: foveal cones, peripheral rods, blind spot, very few S (blue) cones.
Camera sensitivity
Roughly linear response, fixed dynamic range, uniform across the sensor.
Retina sensitivity
Logarithmic/non-linear with huge adaptive dynamic range (pupil, photochemical, neural); rods vs cones split the range.
Camera resolution
Uniform across the whole sensor.
Retina resolution
Very high only in the fovea, falling off toward periphery; eye uses saccades to point the fovea at regions of interest.
Retinal compression
~126M receptors feed only ~1M optic-nerve fibres — heavy pre-processing in the retina.
Saccades
Rapid eye movements that point the high-acuity fovea at successive points of interest.
Convolution
Slide a kernel over an image; each output pixel = weighted sum of its neighbourhood.
Gaussian filter
A low-pass smoothing kernel that blurs and suppresses noise/high frequencies.
Laplacian operator
A high-pass second-derivative operator (∇²) that responds strongly to edges; noise-sensitive on its own.
Laplacian of Gaussian (LoG)
Smooth with a Gaussian then take the Laplacian; finds edges as zero-crossings without amplifying noise.
Sharpening kernel structure
Large positive centre with negative surround (high-pass / unsharp mask).
Unsharp masking formula
sharpened = original − k·∇²(Gaussian ∗ image): add back a scaled Laplacian of the smoothed image.
Sharpening at an edge
Produces overshoot on the bright side and undershoot on the dark side, steepening the transition (raises local contrast).
Sharpening in flat regions
Laplacian ≈ 0, so the pixels are essentially unchanged.
Does sharpening add information?
No — a deterministic filter on the same pixels adds no new information; perceived detail rises but actual information doesn't (can even fall).
Why sharpened image looks more detailed
The visual system reads enhanced edge contrast (over/undershoot, like Mach bands) as more detail; the gain is perceptual (acutance).
Mach bands
Perceived over/undershoot at edges caused by lateral inhibition in the visual system.
Structuring element (SE)
A small shape with a marked origin used to probe a binary image in morphology.
Erosion rule
Keep a foreground pixel only if the SE fits entirely inside the object; otherwise remove it. Shrinks objects.
Dilation rule
Add a background pixel if the SE (origin on it) touches/overlaps the object. Grows objects.
Erosion effect
Shrinks objects, removes thin protrusions, deletes blobs smaller than the SE.
Dilation effect
Grows objects, fills small gaps/notches, can join nearby components.
3×3 square SE
8-connected; isotropic erode/dilate including diagonals.
Plus/cross SE
4-connected; up/down/left/right only, ignores diagonals.
Horizontal 1×3 SE
Anisotropic; erosion removes only left/right edge columns, dilation grows only horizontally.
Opening
Erosion then dilation; removes small specks while preserving larger objects' size (denoising).
Closing
Dilation then erosion; fills small holes/gaps and joins close components.
Tracking definition
Estimating an object's state (e.g. position + velocity) over time from noisy measurements.
Kalman prediction step
Use the motion model to project state and covariance forward → a-priori estimate; uncertainty grows.
Kalman data association
Decide which measurement belongs to which track via predicted measurement + validation gate / Mahalanobis distance.
Kalman correction step
Fuse prediction and measurement via the Kalman gain → a-posteriori estimate; uncertainty shrinks.
Kalman gain
Weights prediction vs measurement by their relative covariances; small when measurements are noisy.
Kalman smoothing
Blends measurements with predictions over time; full smoothers (RTS) run backward using future data to refine past estimates.
Kalman assumptions
Linear model and Gaussian noise (the EKF only approximates non-linearities).
Particle filter
Represents the state distribution with weighted samples (Condensation algorithm).
Particle filter advantage 1
Handles non-linear, non-Gaussian models that break the Kalman assumptions.
Particle filter advantage 2
Maintains multi-modal distributions / multiple hypotheses — robust to clutter, ambiguity and occlusion.
Why prediction raises uncertainty
Propagating through an imperfect motion model adds process noise, so covariance grows.
Why correction lowers uncertainty
Fusing an independent measurement adds information, shrinking covariance below either source alone.
Loss function
The differentiable objective optimised during training (drives gradient descent), e.g. cross-entropy, MSE.
Evaluation metric
The (often non-differentiable) measure of real-world quality judged on held-out data, e.g. accuracy, F1, mAP.
Loss vs metric — same example
Regression using MSE (or MAE) as both the training loss and the reported metric.
Loss vs metric — different example
Classification: cross-entropy loss but accuracy/F1 metric; detection: box+class loss but mAP metric.
Why not optimise accuracy directly
Accuracy is step-like / non-differentiable, giving no usable gradients; use a smooth surrogate like cross-entropy.
IoU
Intersection-over-Union = overlap area ÷ union area of predicted and ground-truth boxes; measures localisation.
mAP
Mean Average Precision: averages precision over recall per class at IoU thresholds; the standard detection metric.
Accuracy for object detection
Poor: ignores localisation (IoU), mishandles FP/FN, inflated by class imbalance (background ≫ objects).
Class imbalance problem
A trivial majority-class predictor scores high accuracy while missing the rare class of interest.
90% accuracy trap
With rare objects, a detector that finds little still scores ~90% by ignoring background; use mAP/precision-recall instead.
Oscillating loss — overnight?
No; non-decreasing loss means it isn't learning. Likely causes: learning rate too high, or data/label/architecture problems.
Learning rate too high
Updates overshoot the minimum, causing the loss to bounce/oscillate instead of converging.
Partial annotation problem
Un-annotated objects are treated as background, penalising correct detections — corrupts training. Annotate fewer images fully.
Overfitting
Training loss falls while validation loss rises; fix with early stopping, regularisation, augmentation, or more data.
Self/unsupervised — when not to use 1
When ample high-quality labels exist and supervised learning is simpler and better.
Self/unsupervised — when not to use 2
When the pretext features don't transfer, compute cost is too high, or guaranteed/interpretable performance is needed.
Image correspondence self-supervision 1
Free supervisory signal from known transforms, stereo geometry, or adjacent video frames — no manual labels.
Image correspondence self-supervision 2
Abundant unlabelled data (video, stereo, multi-view).
Image correspondence self-supervision 3
Built-in consistency constraints (photometric, cycle-consistency, epipolar geometry) act as the loss.
CV pipeline template
Acquire → Pre-process → Segment/Detect → Extract features → Classify/Track/Measure → Output.
Viola–Jones / Haar cascade
Fast classical face/object detector using Haar features + cascade of classifiers.
HOG + SVM
Histogram-of-Oriented-Gradients features classified by an SVM; classic pedestrian/object detector.
YOLO / SSD / Faster R-CNN
CNN-based object detectors producing bounding boxes + class labels.
Hough transform
Detects parametric shapes (lines, circles) by voting in parameter space.
Otsu thresholding
Automatically picks a global threshold that separates foreground/background by maximising between-class variance.
Connected components
Labels groups of connected foreground pixels into separate blobs/objects.
Homography
A perspective transform between two planes; used to warp/rectify (e.g. plate correction, AR overlay).
Optical flow (Lucas–Kanade)
Estimates per-point motion between frames; used for tracking.
Pose estimation tools
OpenPose / MediaPipe / PoseNet output body keypoints/skeletons.
Dynamic Time Warping (DTW)
Aligns and compares time sequences of differing speed; used for pose/gesture similarity.
Drone face-tracking pipeline
Face detect (Haar/CNN) → init tracker (KCF/Kalman) → centroid offset from centre → PID drone control.
Licence plate pipeline
Detect plate (edges/YOLO) → perspective-correct (homography) → segment chars → OCR (CNN/Tesseract).
Colour-segmentation pipeline
RGB→HSV → hue threshold → morphological cleanup → connected components/contours → measure or classify.
Mussel/size measurement pipeline
Segment object → fit ellipse / major axis → calibrate pixels→mm with a reference → output size.
Seven-segment reading pipeline
Threshold display → perspective-correct → split digit cells → test which of 7 segments are on → decode.
Why HSV beats RGB outdoors
Hue is largely invariant to brightness; RGB clusters shift with lighting because channels are correlated.
Coasting during dropouts
When no measurement arrives, run prediction only (propagate by motion model) and let uncertainty grow until re-acquisition.