1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Gradient Descent
To minimize a loss function by iteratively updating parameters in the direction of the negative gradient.
Key hyperparameters in gradient descent
Initial values, Learning rate (𝜂), Number of epochs, Error/loss function.
Learning rate
The size of the steps taken towards the minimum during each iteration.
Learning rate too small
The algorithm converges very slowly.
Learning rate too large
The algorithm may overshoot the minimum or fail to converge.
Plotting training and validation error
To detect overfitting or underfitting and decide when to stop training.
Identifying overfitting
When training error decreases, but validation error increases.
Identifying underfitting
When errors fail to converge, indicating the model is not learning.
Elbow in an error plot
The point where the validation error levels off, suggesting the optimal number of epochs.
Mean Squared Error (MSE)
The average of squared differences between predicted and actual values: MSE = (1/n) ∑(y^ - y)².
Mean Absolute Error (MAE)
The average of absolute differences between predicted and actual values: MAE = (1/n) ∑|y^ - y|.
MSE vs MAE
MSE penalizes large errors more heavily by squaring them, while MAE treats all errors equally.
Preference for MAE over MSE
MAE is more robust to outliers and easier to interpret.
Purpose of a validation set
To tune hyperparameters and monitor performance during training.
Purpose of a test set
To evaluate the final model's performance on unseen data.
Secant Method
To find the root of a function using an iterative approximation.
Secant method formula
x_{n+1} = x_n - f(x_n) * (x_n - x_{n-1}) / (f(x_n) - f(x_{n-1})).
Initial requirements for the secant method
Two initial guesses (x_0 and x_1) close to the root.
Secant method vs Newton's method
The secant method doesn't require derivatives, while Newton's method does.
Key advantage of the secant method
It works even when derivatives are difficult to compute.
Secant method approximating derivatives
It uses the slope of the secant line instead of the derivative.
Stopping criterion for the secant method
When the error is below a threshold or consecutive iterations are very close.
Convergence rate of the secant method
Superlinear, faster than bisection but slower than Newton's method.
Why the secant method might fail
If initial guesses are too far from the root or the function is discontinuous near the guesses.