ML IOT Midterm

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/45

flashcard set

Earn XP

Description and Tags

Mid-Term Multiple Choice Questions

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

46 Terms

1
New cards

In executing a neural network (inference), the operation typically responsible for the majority of the computational cost is:

Matrix-vector multiplication

2
New cards

Which of the following accurately describes the “ReLU” activation function?

Returns positive inputs unchanged, returns zero for negative inputs.

3
New cards

Increasing the size of an input image to be processed by a convolutional neural network will have which effect?

Increase the number of MACs to process the model

4
New cards

A layer of a convolutional neural network has an input tensor that is 120×120×8 (120×120 in the X, Y dimensions with 8 feature maps). The layer has 16 filters, each with a 3×3 kernel and a bias term. Including biases, how many parameters are involved in this layer?

18,448

5
New cards

If you change the “padding” setting on a convolutional layer from “same” to “valid”

The size of the output tensor will decrease

6
New cards

What problem did residual connections help solve?

Vanishing gradients

7
New cards

By utilizing skip connections, the ResNet architecture enabled what?

Models with more layers than had previously been effectively trainable

8
New cards

What pattern in neural network design is illustrated in the diagram on the right

Residual connections

9
New cards

Which of the following best describes quantization?

Numbers are rounded to the nearby values, so they can be represented with integers

10
New cards

What is one advantage of quantizing a model?

The model takes up less storage space than a floating-point model

11
New cards

What is one advantage of using a spectogram over time-domain inputs?

Both a and c:
a) Most of the information is carried in the frequency domain, so a spectogram extracts the key features
c) A spectrogram can represent the important information with fewer values than a time-domain waveform

12
New cards

Which of the following describes the Mel Frequency scale?

A distortion of the frequency scale that roughly matches human perception

13
New cards

What is the purpose of a windowing function in converting time-domain samples to a spectogram?

To reduce spectral leakage by tapering the signal at segment boundaries

14
New cards

Which of the following is a reasonable sampling rate for processing speech?

44.1 kS/s

15
New cards

What is one advantage of Mel-frequency cepstral coefficients (MFCCs) over log filter-bank energy features (LFBEs)?

Most of the information in the spectrum can be represented with fewer coefficients

16
New cards

Which of the following describes what makes a recurrent neural network (RNN) different from other neural networks?

A model where the output of a layer or block at one timestep provides part of the input to the layer/block at the next timestep

17
New cards

For which types of problems are recurrent neural networks well-suited?

Sequences

18
New cards

Which of the following is a problem with simple RNNs?

All of the above (Vanishing gradients, Long-range connections between items, Exploding gradients)

19
New cards

What structural feature allows LSTMs to solve some of RNNs’ problems?

Gated recurrent connections

20
New cards

How does a “Gated Recurrent Unit” (GRU) compare to an LSTM?

Similar but slightly simpler, with one fewer gated connections

21
New cards

What is the key mechanism that allows Transformers to handle long-range dependencies in sequences?

Attention mechanism

22
New cards

In the Transformer architecture, what is the purpose of positional encoding?

To ensure that the order of words in a sentence is considered

23
New cards

Which component(s) in a Transformer helps determine the relevance of one token to the processing of another how much attention one token should pay to another?

Query and Key vectors

24
New cards

What advantage to Transformers have over RNNs for NLP tasks?

They process input sequences in parallel.

25
New cards

What structure extends the notion of attention to encode different relationships with different contexts?

Multi-head attention

26
New cards

What inductive bias is built into convolutional layers?

Translation Equivariance

27
New cards

Consider a fully-connected layer with 32 inputs and 32 outputs. Which uses 16-bit integers for all parameters and activations and does not use bias terms. How many bytes are required to store the layer parameters?

2048

28
New cards

Consider a fully-connected layer with 8 inputs and 16 outputs and no bias terms. How many multiply-accumulate (MAC) operations are required to process the layer?

128

29
New cards

What framework(s) will we primarily use in this class for building and training ML models?

Keras/Tensorflow

30
New cards

What is one risk of allowing a model to fit a more complex decision boundary?

The model may fit the noise in the data rather than the true structure

31
New cards

Which of the following best describes “generalization” in the context of machine learning?

A model performs well on data different than the training data

32
New cards

An imbalanced dataset refers to what?

A dataset where one class is much more frequent than another

33
New cards

In the context of accuracy/performance metrics, what does the “precision” of a model describe?

The probability that a positive prediction is in fact a positive event

34
New cards

Consider a fraud-detection dataset with 490 negatives (clean transaction, no fraud) and 10 positives (fraudulent transaction). Which of the following conclusions is suggested it the model achieves 98% accuracy on the training set?

The model is likely just returning all negative predictions

35
New cards

What is one disadvantage of quantizing a model?

The model may have lower accuracy

36
New cards

Converting a model so that it can use 8-bit integers instead of floating-point numbers is called

Quantization

37
New cards

Choose the best description of transfer learning

Combining part of a model that was pre-trained for one task with a new output layer to perform another task

38
New cards

What is meant by a “backbone” in the context of transfer learning?

The main part of a pre-trained model, excluding the final classification step

39
New cards

What is meant by a “head” in the context of transfer learning?

The final part of a model, attached to a pre-trained backbone and trained for a specific task

40
New cards

What is a potential disadvantage of fine-tuning a model in the transfer learning process?

The early layers might over-fit the fine-tuning dataset

41
New cards

Which of the following is NOT an advantage of transfer learning relative to from-scratch training?

Features computed in early layers will be perfectly optimized for your specific use case

42
New cards

Which of the following components can typically be reused across multiple applications (classification, detection and localization, segmentation, etc.)?

The backbone

43
New cards

What are the advantages of a one-stage detector over a two-stage detector

A one-stage detector is typically faster than a two-stage detector

44
New cards

YOLO is an example of

A one-stage object detection model

45
New cards

What is the primary difference between object detection and image segmentation?

Object detection identifies locations with bounding boxes, while segmentation provides pixel-wise object masks

46
New cards

Which of the following techniques is best suited for distinguishing overlapping objects in an image?

Semantic segmentation