P3:
Concept of Translation Invariance
- Translation invariance refers to the property where the outcome remains unchanged irrespective of the position of the input.
- This principle is essential in eliminating rotational or other variances that may arise during different transformations in image processing.
Introduction to Convolutions
- Convolutions are fundamental in digital image processing, used for statistical analysis and feature extraction from images.
- Early methods were manual (manual feature engineering), relying on expert knowledge to identify which features to extract from images.
Shift from Manual to Automated Feature Engineering
- With neural networks and convolution operations, feature engineering has moved towards automation, allowing neural networks to learn feature extraction automatically.
- Backpropagation algorithms play a crucial role in adjusting filters (kernels) to optimize outcomes based on learned information from input data.
Convolution Operations Overview
- When performing convolution, a filter (kernel) is applied to an image:
- Image and filter example: Let (W) represent the kernel values.
- The filter is slid over the image, multiplying the corresponding values, resulting in neuron activations reflecting that segment of the image.
Sliding Operation in Convolutions
- The term "sliding" refers to moving the filter across the image one step at a time to generate output for each neuron layer.
- This sliding generates feature maps that reflect various aspects of the input image.
Manual Feature Engineering Example
- Demonstrates how multiple kernels can be applied to extract different features from the same image, resulting in unique feature maps.
- Each kernel produces varying neuron activations based on specific feature detection.
Automated Feature Extraction through Backpropagation
- In automated feature extraction, the initial weights for kernels are set randomly.
- The backpropagation algorithm adjusts these weights based on output errors, refining the extracted features to minimize error rates.
Padding in Convolutions
- Padding involves adding extra layers (often zeros) around the input image to maintain edge features during convolution.
- It ensures corner pixels, which usually get less importance, are captured adequately in feature maps.
- Padding helps mitigate the boundary problem where certain features are captured multiple times at the edges of the image while others are neglected.
Stride in Convolutions
- The stride is the number of pixels by which the filter moves across the input image.
- A stride of 1 moves the filter one pixel at a time, while a stride of 2 skips a pixel.
- The stride value affects the dimensions and size of the resultant feature map.
Calculating Feature Map Size
- The general formula for calculating the feature map dimensions involves the input size, filter dimensions, padding, and stride:
- For the height of the feature map: [ fh = \frac{n + 2p - fh}{s} + 1 ]
- For the width, it follows a similar pattern.
- Understanding this formula allows for predicting how convolutions affect image dimensions through various operations and transformations.