1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is image segmentation
Task in which an image is partitioned into subgroups of pixels called segments or regions. Doing this reduces the imageâs complexity, making analysis simpler.
What applications exist for image segmentation?
Self-driving cars (used to identify lanes and other necessary information)
Virtual try-on (need exact points of a given vĂȘtement and then, you can replace it)
Background subtraction
Handwritten text recognition
Biomedical Image Understanding: MRI & Cancerogenous cell segmentation
Semantic Segmentation
Each object within segmentation has a pre-established class that you are looking for, aka each subregion has a semantic meaning (specific class)
Required to find a class for each pixel
What type of data do we have?
in Supervised Image Segmentation there are two pieces of data:
image, I, which can be grayscale or RGB
A segmentation map (same size as I), each entry is a label corresponding to pixel in I
What type of Neural Network do we need here?
Our input should have the same height and width as the input image
A Fully Convolutional Network is needed
Here, our input tensor is an RGB image of shape (3, W, H) and the output tensor will be of shape (K, W, H), where K is the number of classes within a given image
Dense layers are not needed because we are not flattening output anymore, but end to end working in tensors
U-Net
Gets its shape from the process of downsampling and then upsampling that results in a U shape
Number on top of each strip is number of channels and numbers on sides are heights and widths
We start with an RGB image (3 Channels) and end up with a tensor of K channels (one segment of each class)
Downsampling
The idea of the original image getting smaller, but thicker in terms of number of channels
Here, the image is being compressed/ encoding the visual information of the most important features
Upsampling
Process of getting thinner and larger, ensuring that the only useful information of the image is being âdecompressedâ/decoded
Done via Transpose Convolutions
Skip connections
Why are they used?
Mitigate the issue of vanishing gradients
Add extra information to the decoder that might be lost because of downsampling on encoder side of networkâŠ
Transpose Convolution Operation
Typical way of increasing size of a tensor is via Transpose Convolution Operations
if we have two matrices A * B, you multiply each element within A by all the elements in B.
If there is a stride defined, that represents the number of squares you add to both the width and height of the original matrix
Should there be overlap, just sum
Usually, the easiest transpose convolution to perform is with a stride of 2
Transpose Convolution and U-Net in Pytorch
Pytorch has module nn.ConvTranspose2d() for transpose convolutions
weights learned in this layer are the ones in the kernel and amount does not change with input size, only with inputâs number of channels
Instance Segmentation
Special type of segmentation
Not only are you dividing image into relevant subgroups, you are further classifying instances of the same object class
Applications of Instance Segmentation
Useful when distinct objets of similar type are present and need to be monitored separately
Self-driving cars: keeping track of individual pedestrians and cars in videos
Medical scans: segment different nuclei
Satellite imagery: detection and counter of cars, etc.
Data in Instance Segmentation
Following data needed:
images
object-bounding boxes
instance-level segmentation ground-truth (MS COCO)
Revisiting Faster R-CNN
A CNN network
Mask R-CNN
adapts Fast R-CNN to Instance Segmentation
Main diff is that Mask R-CNN adds CNN module for segmentation
Also uses Rol Align module
Advantages of Mask R-CNN
Simplicity
Efficiency
Flexibility
Panoptic Segmentation
Task to detect and segment all objets in picture, including background, and distinguish different instances as well