1/113
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What course/topic is this deck introducing?
Computer Vision (CSE4310) and course overview.
What is computer vision (CV)?
Teaching machines to interpret/understand visual data (images/video).
What’s the basic CV pipeline?
Image/video → sensing (camera/eye) → interpretation (computer/brain) → meaning.
Why is computer vision interdisciplinary?
It combines CS + math + physics + engineering + perception/biology + ML/AI.
What’s the “core gap” CV tries to bridge?
Pixels/numbers → real-world meaning (objects, actions, scenes).
Why is vision considered “important” for humans?
Huge portion of the brain processes visual information.
Why use machines for vision?
Automation, precision, speed, and sensing beyond human vision.
What is “machine vision” in industry?
Automated inspection/quality control using cameras + algorithms.
Example of CV in shopping/retail?
Recognizing products (like produce) for automated checkout.
Face detection vs face analysis—what’s the difference?
Detection finds faces; analysis classifies attributes (smile/age/etc.).
What does a “face makeover” app need to do?
Locate face, align features/landmarks, then modify appearance realistically.
What’s the core idea behind leaf identification apps?
Segment the leaf + extract features + classify against a database.
What does Word Lens-style translation require?
Detect text (OCR) + track the scene + translate + overlay results.
What is the “selling point” of real-time translation overlays?
Translation appears directly on the live camera view (AR feel).
How is the football first-down line possible?
Camera calibration + tracking + field model + consistent overlay.
What does car “night vision” use CV for?
Detecting pedestrians/objects using sensors like infrared.
What is an “around view” parking system?
Stitching multiple cameras into a composite bird’s-eye view.
What’s a key challenge in bird’s-eye camera systems?
Aligning/stitching views correctly (geometry + calibration).
Why did vision in cars accelerate (around 2015)?
Better sensors + compute + deep learning improved detection/understanding.
What is image stitching used for?
Building panoramas or merged views by aligning overlapping images.
What is Photosynth-style reconstruction about?
Using many photos to build navigable/3D-like scene representations.
What does “virtual fitting” require?
Body/pose estimation + realistic overlay/warping of clothing.
Why is CV important for VR interaction?
Tracking hands/objects/body for control in 3D environments.
What’s the big point of “DeepFace”-type systems?
Deep learning learns strong face features for recognition.
What does CVPR attendance growth suggest?
The field is rapidly growing (more researchers/industry interest).
What does CVPR paper growth suggest?
More published research → fast progress + fast-moving topics.
What is “low-level vision”?
Pixel-level operations (filtering, edges, corners, noise reduction).
What is edge detection?
Finding intensity discontinuities often corresponding to boundaries.
What is circle detection an example of?
Detecting geometric shapes/features (often via voting methods like Hough).
What is “mid-level vision”?
Grouping pixels into regions/objects (segmentation, tracking).
What is segmentation?
Partitioning an image into meaningful regions (object vs background, etc.).
What is tracking?
Following an object’s position/motion across video frames.
What is “high-level vision”?
Understanding: recognition, 3D geometry, pose, and scene reasoning.
What is object recognition?
Identifying/classifying objects (labeling what is in the image).
What is scene understanding?
Interpreting the whole scene: objects + relationships + context.
What is the image formation pipeline?
Light → lens → sensor → electrical signal → digitization → image.
What does the pinhole camera model explain?
How 3D points project onto a 2D image plane (basic geometry).
Film vs digital—main difference?
Film chemically records light; digital uses sensor arrays of cells.
How is a grayscale image represented?
2D grid (matrix) of intensity values.
How is a color image represented?
3 channels (R,G,B) → 3 matrices (or one 3D array).
Name common color models.
RGB, CMYK, HSV (and others).
RGB is additive—what does that mean?
Colors add as light; combining channels increases brightness.
CMYK is subtractive—what does that mean?
Ink removes light; used in printing (K improves blacks).
Why use HSV sometimes?
Separates hue/saturation/brightness, often easier for color-based rules.
What is raster graphics?
Pixel grid images; scaling up can pixelate.
What is vector graphics?
Shapes defined mathematically; scale without pixelation.
Lossy vs lossless compression?
Lossy throws away info (smaller); lossless keeps exact data.
What is an alpha channel?
Transparency information (RGBA) used for compositing.
What’s the main idea of image file formats?
Different tradeoffs: size, quality, transparency, compression, purpose.
JPEG—best for what, and what’s the downside?
Great for photos; lossy and no transparency.
GIF—what are its key traits?
Limited colors (palette), supports animation; good for simple graphics.
PNG—why is it popular?
Lossless + supports transparency; good for web graphics/screenshots.
TIFF—why use it?
High-quality, flexible; common in professional imaging/printing.
RAW images—what are they?
Sensor data (“digital negative”), high bit depth, needs processing.
What’s the purpose of the credits slide?
Sources/attribution for slide materials and authors.
What’s the main topic of this deck?
Image filtering: transforming images via math operations.
Conceptually, what is an image in CV?
Data that can be modeled and manipulated mathematically.
How can we represent an image digitally?
As an array/grid of numbers (pixel values).
If values go 0–255, how many bits is that?
8 bits (because 2^8 = 256).
Filtering vs warping?
Filtering changes pixel values; warping changes pixel locations.
Image as a function—what does f(x) mean?
x = [u,v] (pixel coordinate) → intensity value.
What does “range of f(x)” refer to?
The set of possible intensity outputs (e.g., 0–255).
Typical range for an 8-bit grayscale image?
256 levels (0–255).
What does “discrete domain + discrete range” mean?
Pixel coordinates are integer indices; intensities are quantized values.
How do we model a color image as functions?
Separate functions for R, G, B channels.
What is point processing?
Each pixel is transformed independently: y = T(x).
What is neighborhood filtering?
New pixel value depends on nearby pixels (convolution-like).
What defines point processing in one sentence?
Output pixel uses only its own input pixel value (no neighbors).
Two broad categories of image changes shown so far?
Point processing and neighborhood filtering.
What does identity mapping do?
Leaves pixel values unchanged: y = x.
Darkening transform example?
Subtract a constant and clamp to valid range.
How does linear “lower contrast” work?
Scale intensities toward mid/dark: y = x/2 (example).
Why use nonlinear contrast transforms?
They adjust tones differently in shadows/mids/highlights.
Inversion formula?
y = 255 - x (for 8-bit images).
Lightening transform example?
Add a constant (then clamp).
Linear “raise contrast” example?
Multiply intensities (then clamp), e.g., y = 2x.
What does gamma-like mapping do?
Nonlinear remap that changes perceived brightness/contrast.
Examples of stylized point processing?
Sepia, posterization, and other tone remaps.
What is a box filter?
A neighborhood average filter (simple blur).
What’s the 3×3 box filter kernel?
All ones scaled by 1/9 (equal weights).
What is convolution in images (big idea)?
Slide a kernel over the image, compute weighted sums.
Why do early outputs become 0 in the example?
The neighborhood overlaps mostly zeros/background.
What changes as the kernel moves toward a bright region?
Averages increase as more bright pixels enter the window.
Why do we get intermediate values like 10, 20, etc.?
Neighborhood contains a mix of dark and bright pixels.
What does “smoothing an edge” mean numerically?
Sharp step becomes a gradual ramp via averaging.
Why does box filtering blur images?
It replaces each pixel with the mean of its neighbors.
Why do edges become less sharp after blur?
Neighbor mixing reduces high-frequency detail.
What happens deep inside a uniform bright region after box filter?
Output stays close to the original brightness (average ≈ same value).
Why do boundaries look different from interiors?
Boundary neighborhoods include outside/background values.
What is a common border-handling issue in convolution?
Padding choice affects results near edges.
Box filtering is 2D—what does that imply?
It smooths horizontally and vertically (and diagonally via kernel).
One sentence: what does a blur filter do to noise?
Reduces random variation by averaging (but may smear details).
Why is box blur sometimes considered “crude”?
Equal weights can look less natural than tapered weighting.
Big takeaway of the long numeric example sequence?
Convolution = repeated weighted averaging across the grid.
What’s the visual signature of a box filter?
Uniform smoothing; edges soften with a “flat” averaging feel.
What’s a typical use of box blur?
Quick smoothing/noise reduction (cheap computation).
What’s a typical downside of box blur?
Removes fine details and smears edges.
In the example, why does intensity “spread” outward?
Averaging blends bright pixels into neighbors over space.
When the kernel leaves the bright area, what happens?
Output decreases gradually (reverse of entering).
Final box-filter result described in words?
A blurred version where sharp boundaries become smooth gradients.