DAT255 Lecture 5 - Optimizers and Learning Rates

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/13

Earn XP

Description and Tags

Flashcards about optimizers and learning rates in deep learning, based on lecture notes.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

14 Terms

New cards

What are the two conditions a loss function must satisfy?

Differentiable and bounded below (L >= 0).

New cards

For regression tasks, which loss function is primarily used in this course?

Mean squared error (MSE) loss.

New cards

For classification tasks, which loss function is primarily used?

Cross-entropy loss (or log loss).

New cards

In gradient descent, what does (η) represent?

The learning rate.

New cards

What is the primary strategy used in gradient descent to find the optimal solution?

Take steps downward (along the negative gradient).

New cards

What is a potential issue with local minima in the context of optimization?

They can lead to bad predictions.

New cards

What is the purpose of momentum optimization?

To improve on regular gradient descent by keeping track of past gradients.

New cards

What does the (β) hyperparameter represent in momentum optimization?

The momentum, which controls the amount of 'friction'.

New cards

What is the key feature of AdaGrad?

It introduces an adaptive learning rate, adjusted independently for different parameters.

New cards

How does AdaGrad adjust the learning rate for parameters with steep gradients?

It reduces the learning rate quickly.

New cards

How does RMSProp improve upon AdaGrad?

By exponentially scaling down old gradients before summing them.

New cards

What two optimization methods are combined in Adam?

Momentum and RMSProp.

New cards

What is a common issue that must be addressed for all optimization methods?

Choosing an appropriate learning rate.

New cards

Name one method of learning rate scheduling.

Reduce η when learning stops, gradually reduce η for each step, change η by some other rule.

Explore top notes

Eukaryotic Chromosomes Need Chromatin Structures

Updated 877d ago

Note

International trade and the global economy

Updated 292d ago

Note

Divina Comedia por Dante Alighieri - Paraíso

Updated 92d ago

Note

PUBLIC ADMINISTRATION 1205 MIDTERMS

Updated 73d ago

Note

AP English Language & Composition Ultimate Guide

Updated 772d ago

Note

Chapter 6 - Perfectly competitive supply

Updated 1009d ago

Note

(273) Algebra 1 Full Course

Updated 67d ago

Note

AP Human Geography

Updated 45d ago

Note

Explore top flashcards

ap world unit 4

Updated 901d ago

Flashcards (86)

Lista de Verbos Radicales (verbos de bota) endings

Updated 476d ago

Flashcards (32)

1 - Introduction to Accounting

Updated 780d ago

Flashcards (39)

Psych 1003 Midterm Exam

Flashcards (82)

Flashcards (23)

Flashcards (90)

Flashcards (263)

Flashcards (21)