ML IOT Midterm

0.0(0)

Studied by 2 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/45

Earn XP

Description and Tags

Mid-Term Multiple Choice Questions

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

46 Terms

New cards

In executing a neural network (inference), the operation typically responsible for the majority of the computational cost is:

Matrix-vector multiplication

New cards

Which of the following accurately describes the “ReLU” activation function?

Returns positive inputs unchanged, returns zero for negative inputs.

New cards

Increasing the size of an input image to be processed by a convolutional neural network will have which effect?

Increase the number of MACs to process the model

New cards

A layer of a convolutional neural network has an input tensor that is 120×120×8 (120×120 in the X, Y dimensions with 8 feature maps). The layer has 16 filters, each with a 3×3 kernel and a bias term. Including biases, how many parameters are involved in this layer?

18,448

New cards

If you change the “padding” setting on a convolutional layer from “same” to “valid”

The size of the output tensor will decrease

New cards

What problem did residual connections help solve?

Vanishing gradients

New cards

By utilizing skip connections, the ResNet architecture enabled what?

Models with more layers than had previously been effectively trainable

New cards

What pattern in neural network design is illustrated in the diagram on the right

Residual connections

New cards

Which of the following best describes quantization?

Numbers are rounded to the nearby values, so they can be represented with integers

New cards

What is one advantage of quantizing a model?

The model takes up less storage space than a floating-point model

New cards

What is one advantage of using a spectogram over time-domain inputs?

Both a and c:
a) Most of the information is carried in the frequency domain, so a spectogram extracts the key features
c) A spectrogram can represent the important information with fewer values than a time-domain waveform

New cards

Which of the following describes the Mel Frequency scale?

A distortion of the frequency scale that roughly matches human perception

New cards

What is the purpose of a windowing function in converting time-domain samples to a spectogram?

To reduce spectral leakage by tapering the signal at segment boundaries

New cards

Which of the following is a reasonable sampling rate for processing speech?

44.1 kS/s

New cards

What is one advantage of Mel-frequency cepstral coefficients (MFCCs) over log filter-bank energy features (LFBEs)?

Most of the information in the spectrum can be represented with fewer coefficients

New cards

Which of the following describes what makes a recurrent neural network (RNN) different from other neural networks?

A model where the output of a layer or block at one timestep provides part of the input to the layer/block at the next timestep

New cards

For which types of problems are recurrent neural networks well-suited?

Sequences

New cards

Which of the following is a problem with simple RNNs?

All of the above (Vanishing gradients, Long-range connections between items, Exploding gradients)

New cards

What structural feature allows LSTMs to solve some of RNNs’ problems?

Gated recurrent connections

New cards

How does a “Gated Recurrent Unit” (GRU) compare to an LSTM?

Similar but slightly simpler, with one fewer gated connections

New cards

What is the key mechanism that allows Transformers to handle long-range dependencies in sequences?

Attention mechanism

New cards

In the Transformer architecture, what is the purpose of positional encoding?

To ensure that the order of words in a sentence is considered

New cards

Which component(s) in a Transformer helps determine the relevance of one token to the processing of another how much attention one token should pay to another?

Query and Key vectors

New cards

What advantage to Transformers have over RNNs for NLP tasks?

They process input sequences in parallel.

New cards

What structure extends the notion of attention to encode different relationships with different contexts?

Multi-head attention

New cards

What inductive bias is built into convolutional layers?

Translation Equivariance

New cards

Consider a fully-connected layer with 32 inputs and 32 outputs. Which uses 16-bit integers for all parameters and activations and does not use bias terms. How many bytes are required to store the layer parameters?

2048

New cards

Consider a fully-connected layer with 8 inputs and 16 outputs and no bias terms. How many multiply-accumulate (MAC) operations are required to process the layer?

128

New cards

What framework(s) will we primarily use in this class for building and training ML models?

Keras/Tensorflow

New cards

What is one risk of allowing a model to fit a more complex decision boundary?

The model may fit the noise in the data rather than the true structure

New cards

Which of the following best describes “generalization” in the context of machine learning?

A model performs well on data different than the training data

New cards

An imbalanced dataset refers to what?

A dataset where one class is much more frequent than another

New cards

In the context of accuracy/performance metrics, what does the “precision” of a model describe?

The probability that a positive prediction is in fact a positive event

New cards

Consider a fraud-detection dataset with 490 negatives (clean transaction, no fraud) and 10 positives (fraudulent transaction). Which of the following conclusions is suggested it the model achieves 98% accuracy on the training set?

The model is likely just returning all negative predictions

New cards

What is one disadvantage of quantizing a model?

The model may have lower accuracy

New cards

Converting a model so that it can use 8-bit integers instead of floating-point numbers is called

Quantization

New cards

Choose the best description of transfer learning

Combining part of a model that was pre-trained for one task with a new output layer to perform another task

New cards

What is meant by a “backbone” in the context of transfer learning?

The main part of a pre-trained model, excluding the final classification step

New cards

What is meant by a “head” in the context of transfer learning?

The final part of a model, attached to a pre-trained backbone and trained for a specific task

New cards

What is a potential disadvantage of fine-tuning a model in the transfer learning process?

The early layers might over-fit the fine-tuning dataset

New cards

Which of the following is NOT an advantage of transfer learning relative to from-scratch training?

Features computed in early layers will be perfectly optimized for your specific use case

New cards

Which of the following components can typically be reused across multiple applications (classification, detection and localization, segmentation, etc.)?

The backbone

New cards

What are the advantages of a one-stage detector over a two-stage detector

A one-stage detector is typically faster than a two-stage detector

New cards

YOLO is an example of

A one-stage object detection model

New cards

What is the primary difference between object detection and image segmentation?

Object detection identifies locations with bounding boxes, while segmentation provides pixel-wise object masks

New cards

Which of the following techniques is best suited for distinguishing overlapping objects in an image?