1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
In nn.Linear(64, 256), what does 256 represent?
The number of output neurons.
How many trainable parameters does nn.Linear(10, 5) contain?
55
A model with more parameters is always more accurate.
False.
How many trainable parameters does nn.Linear(10, 5) contain? (Formula)
(10 × 5) + 5 = 55.
LayerNorm normalizes each feature across samples, while BatchNorm normalizes all features within a single sample.
False.
What happens if we stack two linear layers without an activation function?
The model is equivalent to a single linear layer.
Why are convolutional layers better suited for images?
They preserve spatial structure.
A batch size of 100 is used on a dataset of 1000 samples. How many gradient updates occur in one epoch?
10.
What is the main role of a ReLU activation?
Introduce non-linearity.
What does MaxPool2D keep from each local window?
The strongest (maximum) value.
What is the purpose of pooling layers?
Reduce spatial dimensions while retaining important features.
What do convolutional filters learn?
Local patterns/features such as edges and textures.
Which of the following is NOT an activation function?
SoftmaxPool.
What is the role of an optimizer?
Minimize the loss function by updating parameters.
What can happen if the learning rate is too high?
Training becomes unstable and may not converge.
What is overfitting?
The model memorizes training data and performs poorly on unseen data.
What is the purpose of the validation set?
Tune hyperparameters and monitor generalization.
What is the purpose of the test set?
Provide an unbiased final evaluation of the model.
Which metric is commonly used for classification?
Accuracy.
What is the goal of gradient descent?
Minimize the loss function.
Which learning paradigm uses unlabeled data?
Unsupervised learning.
Predicting house prices is an example of:
Regression.
What does a low loss value generally indicate?
Predictions are close to the target values.
What is Deep Learning?
Machine learning using neural networks with multiple layers.
What advantage do mini-batches provide?
A balance between efficiency and stable gradient estimates.
What is transfer learning?
Using knowledge from a pretrained model on a new task.
What is Adam?
An optimization algorithm.
What is the main goal of Machine Learning?
To enable systems to learn patterns from data and make predictions or decisions.
What is the purpose of Dropout?
Reduce overfitting by randomly deactivating neurons during training.
What is the purpose of Batch Normalization?
Normalize activations to stabilize and speed up training.
What is the purpose of Softmax?
Convert outputs into class probabilities that sum to 1.
What is the role of the output layer?
Produce the final prediction.
What is the purpose of padding in CNNs?
Preserve border information and control output size.