KD

P5: Convolutional Neural Networks: Key Concepts and Calculations

  • Understanding Convolutional Neural Networks (CNNs)

    • CNNs consist of several layers including convolutional layers, activation functions, and pooling layers.
    • The design and implementation of CNNs involve understanding parameters such as kernel size, strides, and padding.
  • Convolutional Layers

    • Each convolution layer applies a series of filters (or kernels) to an input tensor.
    • Kernel Size: Typically represented as $K \times K \times C$, where $K$ is the height/width of the kernel and $C$ is the number of channels.
    • Example: For a filter size of $3 \times 3$ and $3$ channels, the total number of parameters is $3 \times 3 \times 3 + 1$ (including bias).
    • The output feature map is generated by multiplying the input tensor segment with the kernel weights, adding the bias, and applying an activation function.
  • Weights and Biases

    • The total learnable parameters in the convolutional layer are calculated as:
      \text{Total Learnable Parameters} = K \times K \times C_N + 1
    • Where $C_N$ is the number of filters (kernels).
  • Activation Functions

    • After convolution operations, an activation function (commonly ReLU) is applied to introduce non-linearity into the model.
    • ReLU is defined as $f(x) = \max(0, x)$.
  • Pooling Layers

    • The pooling operation helps reduce dimensionality and maintain computational efficiency.
    • Common types: Max Pooling and Average Pooling.
    • Max Pooling: Selects the maximum value from a segment of the feature map.
    • Parameters:
    • Max pooling size can be defined (e.g., $2 \times 2$) with a specified stride (movement of the window over the feature map).
  • Understanding Pooling Operations

    • Pooling Operation Characteristics:
    • No weights are involved; it's strictly a selection operation (maximum or average).
    • Pooling layers maintain the depth but reduce the spatial dimensions (height and width).
    • Dimensionality reduction helps mitigate issues like local variance caused by image distortions.
  • Flattening and Fully Connected Layers

    • After pooling, the output is flattened to create a single vector input into a fully connected layer.
    • Fully Connected Layer connects every neuron in one layer to every neuron in the next, allowing complex interactions between features.
    • The output from the fully connected layer serves as the final probabilities for class predictions.
  • Calculating Feature Map Size

    • When applying convolution:
      \text{Output Size} = \left(\frac{W + 2P - F}{S} + 1\right)
    • Where:
      • $W$ = Input size (width/height)
      • $P$ = Padding
      • $F$ = Filter size
      • $S$ = Stride
  • Sample Calculation

    • For an input of $25 \times 25$ with a $3 \times 3$ filter, stride $1$, and padding $1$:
      \text{Output Size} = \frac{25 + 2 \times 1 - 3}{1} + 1 = 25
    • Applying multiple filters will change the depth depending on the number of filters used.
  • Summary of Learnable Parameters in CNNs

    • Important to remember the learned parameters from each convolution and fully connected layer:
    • Example Parameters include:
      • First Layer: $3 \times 3 \times 10 + 10$
      • Second Layer: $3 \times 3 \times 15 + 15$
  • Downsampling

    • Pooling operations can also be referred to as downsampling due to their nature of reducing feature dimensions while preserving important characteristics of the input data.