DAT255 Lecture 5 - Optimizers and Learning Rates

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/13

flashcard set

Earn XP

Description and Tags

Flashcards about optimizers and learning rates in deep learning, based on lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

14 Terms

1
New cards

What are the two conditions a loss function must satisfy?

Differentiable and bounded below (L >= 0).

2
New cards

For regression tasks, which loss function is primarily used in this course?

Mean squared error (MSE) loss.

3
New cards

For classification tasks, which loss function is primarily used?

Cross-entropy loss (or log loss).

4
New cards

In gradient descent, what does (η) represent?

The learning rate.

<p>The learning rate.</p>
5
New cards

What is the primary strategy used in gradient descent to find the optimal solution?

Take steps downward (along the negative gradient).

6
New cards

What is a potential issue with local minima in the context of optimization?

They can lead to bad predictions.

7
New cards

What is the purpose of momentum optimization?

To improve on regular gradient descent by keeping track of past gradients.

8
New cards

What does the (β) hyperparameter represent in momentum optimization?

The momentum, which controls the amount of 'friction'.

9
New cards

What is the key feature of AdaGrad?

It introduces an adaptive learning rate, adjusted independently for different parameters.

10
New cards

How does AdaGrad adjust the learning rate for parameters with steep gradients?

It reduces the learning rate quickly.

11
New cards

How does RMSProp improve upon AdaGrad?

By exponentially scaling down old gradients before summing them.

12
New cards

What two optimization methods are combined in Adam?

Momentum and RMSProp.

13
New cards

What is a common issue that must be addressed for all optimization methods?

Choosing an appropriate learning rate.

14
New cards

Name one method of learning rate scheduling.

Reduce η when learning stops, gradually reduce η for each step, change η by some other rule.