ANN Topic 3 : Backpropagation & Gradient Descent

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/9

Earn XP

Description and Tags

Artificial Neural Network

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

10 Terms

New cards

Backpropagation

An algorithm used to compute gradients of the loss function with respect to each weight in a neural network by applying the chain rule, enabling weight updates during training.

New cards

Gradient Descent

An optimization algorithm that updates model weights by moving in the direction of the negative gradient of the loss function to minimize error.

New cards

Loss Function

A mathematical function that quantifies the difference between predicted output and actual target values (e.g., MSE, cross-entropy).

New cards

Chain Rule

A calculus rule for computing the derivative of a function composed of other functions. It is essential in backpropagation to propagate error through layers.

New cards

Learning Rate (η)

A hyperparameter that determines the step size for weight updates. Small values lead to slow learning; large values can cause overshooting or divergence.

New cards

Batch Gradient Descent

A version of gradient descent that uses the entire training dataset to compute gradients for each weight update.

New cards

Stochastic Gradient Descent (SGD)

A gradient descent variant that updates weights using a single data sample per iteration, making it faster but noisier.

New cards

Mini-Batch Gradient Descent

A compromise between batch and SGD that updates weights using a small batch of data points (e.g., 32, 64 samples).

New cards

Vanishing Gradient

A problem where gradients become very small during backpropagation, preventing weight updates in earlier layers — common with sigmoid and tanh.

New cards

Exploding Gradient

A problem where gradients become excessively large during backpropagation, leading to unstable updates and possible overflow.