1/45
Mid-Term Multiple Choice Questions
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
In executing a neural network (inference), the operation typically responsible for the majority of the computational cost is:
Matrix-vector multiplication
Which of the following accurately describes the “ReLU” activation function?
Returns positive inputs unchanged, returns zero for negative inputs.
Increasing the size of an input image to be processed by a convolutional neural network will have which effect?
Increase the number of MACs to process the model
A layer of a convolutional neural network has an input tensor that is 120×120×8 (120×120 in the X, Y dimensions with 8 feature maps). The layer has 16 filters, each with a 3×3 kernel and a bias term. Including biases, how many parameters are involved in this layer?
18,448
If you change the “padding” setting on a convolutional layer from “same” to “valid”
The size of the output tensor will decrease
What problem did residual connections help solve?
Vanishing gradients
By utilizing skip connections, the ResNet architecture enabled what?
Models with more layers than had previously been effectively trainable
What pattern in neural network design is illustrated in the diagram on the right
Residual connections
Which of the following best describes quantization?
Numbers are rounded to the nearby values, so they can be represented with integers
What is one advantage of quantizing a model?
The model takes up less storage space than a floating-point model
What is one advantage of using a spectogram over time-domain inputs?
Both a and c:
a) Most of the information is carried in the frequency domain, so a spectogram extracts the key features
c) A spectrogram can represent the important information with fewer values than a time-domain waveform
Which of the following describes the Mel Frequency scale?
A distortion of the frequency scale that roughly matches human perception
What is the purpose of a windowing function in converting time-domain samples to a spectogram?
To reduce spectral leakage by tapering the signal at segment boundaries
Which of the following is a reasonable sampling rate for processing speech?
44.1 kS/s
What is one advantage of Mel-frequency cepstral coefficients (MFCCs) over log filter-bank energy features (LFBEs)?
Most of the information in the spectrum can be represented with fewer coefficients
Which of the following describes what makes a recurrent neural network (RNN) different from other neural networks?
A model where the output of a layer or block at one timestep provides part of the input to the layer/block at the next timestep
For which types of problems are recurrent neural networks well-suited?
Sequences
Which of the following is a problem with simple RNNs?
All of the above (Vanishing gradients, Long-range connections between items, Exploding gradients)
What structural feature allows LSTMs to solve some of RNNs’ problems?
Gated recurrent connections
How does a “Gated Recurrent Unit” (GRU) compare to an LSTM?
Similar but slightly simpler, with one fewer gated connections
What is the key mechanism that allows Transformers to handle long-range dependencies in sequences?
Attention mechanism
In the Transformer architecture, what is the purpose of positional encoding?
To ensure that the order of words in a sentence is considered
Which component(s) in a Transformer helps determine the relevance of one token to the processing of another how much attention one token should pay to another?
Query and Key vectors
What advantage to Transformers have over RNNs for NLP tasks?
They process input sequences in parallel.
What structure extends the notion of attention to encode different relationships with different contexts?
Multi-head attention
What inductive bias is built into convolutional layers?
Translation Equivariance
Consider a fully-connected layer with 32 inputs and 32 outputs. Which uses 16-bit integers for all parameters and activations and does not use bias terms. How many bytes are required to store the layer parameters?
2048
Consider a fully-connected layer with 8 inputs and 16 outputs and no bias terms. How many multiply-accumulate (MAC) operations are required to process the layer?
128
What framework(s) will we primarily use in this class for building and training ML models?
Keras/Tensorflow
What is one risk of allowing a model to fit a more complex decision boundary?
The model may fit the noise in the data rather than the true structure
Which of the following best describes “generalization” in the context of machine learning?
A model performs well on data different than the training data
An imbalanced dataset refers to what?
A dataset where one class is much more frequent than another
In the context of accuracy/performance metrics, what does the “precision” of a model describe?
The probability that a positive prediction is in fact a positive event
Consider a fraud-detection dataset with 490 negatives (clean transaction, no fraud) and 10 positives (fraudulent transaction). Which of the following conclusions is suggested it the model achieves 98% accuracy on the training set?
The model is likely just returning all negative predictions
What is one disadvantage of quantizing a model?
The model may have lower accuracy
Converting a model so that it can use 8-bit integers instead of floating-point numbers is called
Quantization
Choose the best description of transfer learning
Combining part of a model that was pre-trained for one task with a new output layer to perform another task
What is meant by a “backbone” in the context of transfer learning?
The main part of a pre-trained model, excluding the final classification step
What is meant by a “head” in the context of transfer learning?
The final part of a model, attached to a pre-trained backbone and trained for a specific task
What is a potential disadvantage of fine-tuning a model in the transfer learning process?
The early layers might over-fit the fine-tuning dataset
Which of the following is NOT an advantage of transfer learning relative to from-scratch training?
Features computed in early layers will be perfectly optimized for your specific use case
Which of the following components can typically be reused across multiple applications (classification, detection and localization, segmentation, etc.)?
The backbone
What are the advantages of a one-stage detector over a two-stage detector
A one-stage detector is typically faster than a two-stage detector
YOLO is an example of
A one-stage object detection model
What is the primary difference between object detection and image segmentation?
Object detection identifies locations with bounding boxes, while segmentation provides pixel-wise object masks
Which of the following techniques is best suited for distinguishing overlapping objects in an image?
Semantic segmentation