AML SA1

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 9:27 PM on 6/14/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

40 Terms

1
New cards

What evaluation metric would you most likely avoid in an imbalanced binary classification problem?

Accuracy

2
New cards

Dropout is used to increase model complexity.

False

3
New cards

Adam optimizer uses both first and second moment estimates.

True

4
New cards

L1 regularization uses which mathematical approach?

Absolute value of weights

5
New cards

Regularization is most helpful when:

The model is overfitting

6
New cards

Training loss should always be lower than test loss in a good model.

True

7
New cards

A confusion matrix provides insight into:

Classification performance

8
New cards

What type of validation helps reduce model variance by rotating the validation set?

K-fold Cross Validation

9
New cards

Mini-batch gradient descent typically converges faster than batch gradient descent.

True

10
New cards

When is learning rate scheduling particularly useful?

When training loss plateaus

11
New cards

Batch Gradient Descent differs from Mini-Batch in that it:

Uses the entire dataset to compute a single update

12
New cards

What does increasing the dropout rate typically do?

Reduce overfitting by increasing neuron variability

13
New cards

In gradient descent, a smaller learning rate generally leads to:

Slower, more stable convergence

14
New cards

AUC measures the model's ability to classify correctly at various thresholds.

True

15
New cards

Which technique disables random neurons during training?

Dropout

16
New cards

The area under the ROC curve indicates:

Discriminative ability of a model

17
New cards

Regularization increases the model's training accuracy.

False

18
New cards

Feature engineering is not part of the ML pipeline.

False

19
New cards

The Adam optimizer is considered superior to vanilla SGD because it:

Adapts learning rates and includes momentum

20
New cards

What does early stopping monitor to determine when to halt training?

Validation performance

21
New cards

R-squared is a metric used in classification.

False

22
New cards

Cross-validation helps detect if a model is underfitting.

True

23
New cards

Which loss function penalizes larger errors more significantly?

MSE

24
New cards

What is the primary trade-off involved in setting a learning rate too high?

Risk of overshooting the minimum

25
New cards

A regularization technique that results in feature selection is:

Regularization

26
New cards

Validation loss is often used to trigger early stopping.

True

27
New cards

Which of the following describes a characteristic of supervised learning?

It predicts outputs using labeled datasets

28
New cards

A low learning rate can result in slow but stable convergence.

True

29
New cards

ReLU is a commonly used loss function in classification.

False

30
New cards

Overfitting usually occurs when the model is too simple.

False

31
New cards

What is the role of the test set in model development?

To estimate real-world performance

32
New cards

Which of the following optimizers maintains a running average of past squared gradients?

RMSProp

33
New cards

What component in optimization helps reduce oscillation and improve directionality?

Momentum

34
New cards

In classification tasks, which metric is most concerned with minimizing false negatives?

Recall

35
New cards

Which metric is best suited for regression tasks?

Mean Absolute Error

36
New cards

In the machine learning workflow, what is the main purpose of feature engineering?

Enhance data representation for better learning

37
New cards

The main goal of optimization is to increase accuracy on the test set.

False

38
New cards

RMSProp improves over SGD by:

Scaling learning rates by past gradient magnitudes 

39
New cards

RMSProp maintains a history of past gradients.

True

40
New cards

What is the primary reason for using L2 regularization?

It penalizes large weights to reduce overfitting