1/29
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a CNN?
A neural network designed to process grid-like data like images using convolutional layers.
What is the main advantage of CNNs over fully connected networks for images?
They preserve spatial relationships and dramatically reduce parameters through weight sharing.
What is a filter/kernel in a CNN?
A small matrix of weights that slides across the image to detect specific features.
How does 2D convolution work on a 3D RGB image?
The 3D filter slides horizontally and vertically (not along depth) because filter depth matches input depth.
What happens when multiple filters are applied to an input?
They produce an output volume where depth equals the number of filters used.
What is the formula for output size after convolution?
(Input Size - Filter Size + 2×Padding) / Stride + 1
What is padding?
Adding zeros around the input image to control output size and preserve border information.
What is stride?
The step size (number of pixels) the filter moves each time it slides across the image.
What happens when stride increases?
The output size decreases and computations are reduced.
What is the purpose of pooling?
To down sample feature maps, reducing spatial dimensions, parameters, and computational load.
What is max pooling?
Taking the maximum value from each region of the feature map.
What is the typical structure of a CNN?
Conv → ReLU → Pooling → Conv → ReLU → Pooling → Flatten → Fully Connected → Output.
What makes VGG architecture unique?
It uses only small 3×3 filters stacked very deep in a simple, uniform pattern.
What is the main drawback of VGG?
It has a huge number of parameters (~138 million), making it computationally expensive.
What is the key innovation of GoogleNet (Inception)?
The Inception module that applies multiple filter sizes (1×1, 3×3, 5×5) in parallel.
What is the key innovation of ResNet?
Skip connections (residual connections) that allow gradients to flow directly through very deep networks.
What problem do skip connections solve?
The vanishing gradient problem, enabling training of networks with over 100 layers.
What is transfer learning?
Reusing a model pre-trained on a large dataset as a starting point for a new, related task.
Why is transfer learning effective?
It leverages pre-learned features, requiring less data and training time.
What are the three main CNN models covered?
VGG (simple and deep), GoogleNet (multi-scale filters), and ResNet (skip connections).
What is the receptive field?
The region of the input image that influences a particular output neuron.
What is a feature map?
The output produced by applying a single filter to the input.
What is the role of the activation function (ReLU) after convolution?
To introduce non-linearity, allowing the network to learn complex patterns.
What is flattening in a CNN?
Converting the 3D output of convolutional layers into a 1D vector for fully connected layers.
What is the relationship between filter size and padding to maintain output size?
Padding needed = (Filter Size - 1) / 2 for same output size.
How do early CNN layers differ from later layers?
Early layers detect simple features (edges, colors); later layers detect complex features (shapes, objects).
What is weight sharing in CNNs?
The same filter weights are used across all positions of the input, reducing parameters.
What is translation invariance in CNNs?
The ability to recognize objects regardless of their position in the image.
What is the output of a CNN for classification?
A probability distribution over classes, typically using Softmax activation.
What is the difference between valid padding and same padding?
Valid padding gives smaller output; same padding gives output equal to input size when stride=1.