1/11
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
In gradient descent, what is the purpose of the learning rate alpha?
It controls how quickly the model learns
What is typically used as the cost function in Linear Regression with Gradient Descent?
Mean Squared Error
What does the gradient represent in gradient descent?
Rate of change of the cost with respect to the parameters
Why do we normalize data before applying gradient descent?
To improve convergence speed
If the Gradient Descent method does not converge, what could be the problem?
The learning rate is too high, causing oscillations or divergence
The learning rate is too low, the algorithm is getting stuck at a local minimum
Unscaled data leading to slow convergence
A complex non-convex surface containing local minima or saddle points where optimizer gets stuck
Consider the function F(x, y) = (𝑥−1)^2 + (𝑦−1)^2 with initial guess (3,3). Compute the first step of the gradient descent method using the best possible with an alpha that works.
Gradient F = [(2(x-1)), (2(y-1))]
[(2(3-1)), (2(3-1))] = [4, 4]
x1 = x0 - a (gradient) → 3 - 4a
y1 = y0 - a (gradient) → 3 - 4a
F(a) = ((3 - 4a)-1)² + ((3-4a)-1)²
= (2-4a)² + (2-4a)²
2(2-4a)²
F’(a) = 4(2-4a)) - 4 = -16(2-4a)
-16(2-4a) = 0
-32 + 64a = 0
64a = 32
a = 0.5
x1 = 3-4(0.5) = 1
y1 = 3-4(0.5) = 1
1st step = (1,1)
Write the formula for exact line search for x and y
x1 = x0 - a (gradient x)
y1 = y0 - a (gradient y)
What are the 3 ways to find a regression line?
Analytical Method: Solving mathematically using matrices
Optimization method: Iteratively updating slope (m) and intercept (b) to minimize MSE
Library/Built in method: Using tools like np.polyfit or scipy.stats.linregress
You are tasked with completing the LinearRegression class from your project. Assuming the fit method has already calculated self.m (slope) and self.b (intercept), write the Python code for the predict and mse (Mean Squared Error) methods.
def predict(self, X: np.array) -> np.array:
y_pred = self.m * X + self.b
return y_pred
def mse(self, y_true: np.array, y_pred: np.array) -> float:
return np.mean((y_true-y_pred)**2)Below is an incomplete Gradient Descent function. Fill in the missing code to accomplish two things:
Implement the step update.
Add a momentum mechanic where the learning rate alpha is cut in half (alpha / 2) if the absolute value of the current function F(x) is exactly greater than 100.
def GradientDescentMod(DF, F, x0, maxit=100, alpha=0.1):
x = np.copy(x0)
for i in range(maxit):
# 1. Fill in the update rule for x
x = ______________________________
# 2. Modify alpha if F(x) > 100
if ______________________________:
alpha = ___________________
return xx = x - alpha * DF(x)
if abs(F(x)) > 100:
alpha = alpha/2What is Stochastic Gradient Descent?
Instead of looking at the whole dataset, SGD picks a single data point at each iteration, calculates the gradient for just that point, and takes a step.
How does gradient descent know when to stop taking steps?
It stops taking steps when step size is very close to 0.