P5: Convolutional Neural Networks: Key Concepts and Calculations

Understanding Convolutional Neural Networks (CNNs)
- CNNs consist of several layers including convolutional layers, activation functions, and pooling layers.
- The design and implementation of CNNs involve understanding parameters such as kernel size, strides, and padding.
Convolutional Layers
- Each convolution layer applies a series of filters (or kernels) to an input tensor.
- Kernel Size: Typically represented as $K \times K \times C$, where $K$ is the height/width of the kernel and $C$ is the number of channels.
- Example: For a filter size of $3 \times 3$ and $3$ channels, the total number of parameters is $3 \times 3 \times 3 + 1$ (including bias).
- The output feature map is generated by multiplying the input tensor segment with the kernel weights, adding the bias, and applying an activation function.
Weights and Biases
- The total learnable parameters in the convolutional layer are calculated as:
  \text{Total Learnable Parameters} = K \times K \times C_N + 1
- Where $C_N$ is the number of filters (kernels).
Activation Functions
- After convolution operations, an activation function (commonly ReLU) is applied to introduce non-linearity into the model.
- ReLU is defined as $f(x) = \max(0, x)$.
Pooling Layers
- The pooling operation helps reduce dimensionality and maintain computational efficiency.
- Common types: Max Pooling and Average Pooling.
- Max Pooling: Selects the maximum value from a segment of the feature map.
- Parameters:
- Max pooling size can be defined (e.g., $2 \times 2$) with a specified stride (movement of the window over the feature map).
Understanding Pooling Operations
- Pooling Operation Characteristics:
- No weights are involved; it's strictly a selection operation (maximum or average).
- Pooling layers maintain the depth but reduce the spatial dimensions (height and width).
- Dimensionality reduction helps mitigate issues like local variance caused by image distortions.
Flattening and Fully Connected Layers
- After pooling, the output is flattened to create a single vector input into a fully connected layer.
- Fully Connected Layer connects every neuron in one layer to every neuron in the next, allowing complex interactions between features.
- The output from the fully connected layer serves as the final probabilities for class predictions.
Calculating Feature Map Size
- When applying convolution:
  \text{Output Size} = \left(\frac{W + 2P - F}{S} + 1\right)
- Where:
  - $W$ = Input size (width/height)
  - $P$ = Padding
  - $F$ = Filter size
  - $S$ = Stride
Sample Calculation
- For an input of $25 \times 25$ with a $3 \times 3$ filter, stride $1$, and padding $1$:
  \text{Output Size} = \frac{25 + 2 \times 1 - 3}{1} + 1 = 25
- Applying multiple filters will change the depth depending on the number of filters used.
Summary of Learnable Parameters in CNNs
- Important to remember the learned parameters from each convolution and fully connected layer:
- Example Parameters include:
  - First Layer: $3 \times 3 \times 10 + 10$
  - Second Layer: $3 \times 3 \times 15 + 15$
Downsampling
- Pooling operations can also be referred to as downsampling due to their nature of reducing feature dimensions while preserving important characteristics of the input data.