Ch4: Computer Vision Techniques

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/8

There's no tags or description

Looks like no tags are added yet.

Last updated 9:10 PM on 6/5/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

9 Terms

New cards

What is the main operational difference between Image Segmentation and Object Detection?

Object Detection locates objects using a rectangular bounding box, whereas Image Segmentation assigns a label to every single pixel to outline the precise shape of the object.

New cards

Where is the origin (0,0) of coordinates located on an image in computer vision?

At the upper-left corner of the image.

New cards

What are the two main ways to mathematically represent a bounding box?

Two-corner coordinates (x_1, y_1, x_2, y_2) and center coordinates with box width and height (x, y, w, h).

New cards

What is an Anchor Box?

A reference bounding box generated around a pixel with a specific scale and aspect ratio, used by models to sample potential object regions.

New cards

Why is the original R-CNN model too slow for real-world applications

Because it extracts thousands of region proposals per image and has to run a complete, independent CNN forward propagation pass on every single individual proposal.

New cards

How did Fast R-CNN solve the speed bottleneck of the original R-CNN?

It runs the CNN forward pass only once on the entire image to extract a global feature map, and then shares that computation across all regional interests using an RoI pooling layer.

New cards

What major component did Faster R-CNN introduce to replace Selective Search?

The region proposal network (RPN), which lets the network learn to generate data-driven region proposals efficiently and end-to-end.

New cards

What technique does Mask R-CNN introduce to handle precise spatial locations, and what does it replace?

It replaces the RoI Pooling layer with an RoI Align layer, which utilizes bilinear interpolation to preserve precise, pixel-level spatial details.

New cards

What are the three distinct outputs generated by a Mask R-CNN model for a single object?

1) Class label, 2) Bounding box rectangle, and 3) A pixel-level binary segmentation mask outlining the object's exact shape.