1/9
Artificial Neural Network
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Backpropagation
An algorithm used to compute gradients of the loss function with respect to each weight in a neural network by applying the chain rule, enabling weight updates during training.
Gradient Descent
An optimization algorithm that updates model weights by moving in the direction of the negative gradient of the loss function to minimize error.
Loss Function
A mathematical function that quantifies the difference between predicted output and actual target values (e.g., MSE, cross-entropy).
Chain Rule
A calculus rule for computing the derivative of a function composed of other functions. It is essential in backpropagation to propagate error through layers.
Learning Rate (η)
A hyperparameter that determines the step size for weight updates. Small values lead to slow learning; large values can cause overshooting or divergence.
Batch Gradient Descent
A version of gradient descent that uses the entire training dataset to compute gradients for each weight update.
Stochastic Gradient Descent (SGD)
A gradient descent variant that updates weights using a single data sample per iteration, making it faster but noisier.
Mini-Batch Gradient Descent
A compromise between batch and SGD that updates weights using a small batch of data points (e.g., 32, 64 samples).
Vanishing Gradient
A problem where gradients become very small during backpropagation, preventing weight updates in earlier layers — common with sigmoid and tanh.
Exploding Gradient
A problem where gradients become excessively large during backpropagation, leading to unstable updates and possible overflow.