Math 240 Gradient Descent and Linear Regression

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/11

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:41 PM on 4/19/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

12 Terms

1
New cards

In gradient descent, what is the purpose of the learning rate alpha?

It controls how quickly the model learns

2
New cards

What is typically used as the cost function in Linear Regression with Gradient Descent?

Mean Squared Error

3
New cards

What does the gradient represent in gradient descent?

Rate of change of the cost with respect to the parameters

4
New cards

Why do we normalize data before applying gradient descent?

To improve convergence speed

5
New cards

If the Gradient Descent method does not converge, what could be the problem?

  1. The learning rate is too high, causing oscillations or divergence

  2. The learning rate is too low, the algorithm is getting stuck at a local minimum

  3. Unscaled data leading to slow convergence

  4. A complex non-convex surface containing local minima or saddle points where optimizer gets stuck

6
New cards

Consider the function F(x, y) = (𝑥−1)^2 + (𝑦−1)^2 with initial guess (3,3). Compute the first step of the gradient descent method using the best possible with an alpha that works.

Gradient F = [(2(x-1)), (2(y-1))]
[(2(3-1)), (2(3-1))] = [4, 4]

x1 = x0 - a (gradient) → 3 - 4a

y1 = y0 - a (gradient) → 3 - 4a

F(a) = ((3 - 4a)-1)² + ((3-4a)-1)²
= (2-4a)² + (2-4a)²

2(2-4a)²

F’(a) = 4(2-4a)) - 4 = -16(2-4a)
-16(2-4a) = 0
-32 + 64a = 0
64a = 32
a = 0.5

x1 = 3-4(0.5) = 1

y1 = 3-4(0.5) = 1

1st step = (1,1)

7
New cards

Write the formula for exact line search for x and y

x1 = x0 - a (gradient x)

y1 = y0 - a (gradient y)

8
New cards

What are the 3 ways to find a regression line?

  1. Analytical Method: Solving mathematically using matrices

  2. Optimization method: Iteratively updating slope (m) and intercept (b) to minimize MSE

  3. Library/Built in method: Using tools like np.polyfit or scipy.stats.linregress

9
New cards

You are tasked with completing the LinearRegression class from your project. Assuming the fit method has already calculated self.m (slope) and self.b (intercept), write the Python code for the predict and mse (Mean Squared Error) methods.

def predict(self, X: np.array) -> np.array:
      y_pred = self.m * X + self.b
      return y_pred

def mse(self, y_true: np.array, y_pred: np.array) -> float:
      return np.mean((y_true-y_pred)**2)

10
New cards

Below is an incomplete Gradient Descent function. Fill in the missing code to accomplish two things:

  1. Implement the step update.

  2. Add a momentum mechanic where the learning rate alpha is cut in half (alpha / 2) if the absolute value of the current function F(x) is exactly greater than 100.

def GradientDescentMod(DF, F, x0, maxit=100, alpha=0.1):
    x = np.copy(x0)
    for i in range(maxit):
        # 1. Fill in the update rule for x
        x = ______________________________
        
        # 2. Modify alpha if F(x) > 100
        if ______________________________:
            alpha = ___________________
            
    return x

x = x - alpha * DF(x)

if abs(F(x)) > 100:
      alpha = alpha/2

11
New cards

What is Stochastic Gradient Descent?

Instead of looking at the whole dataset, SGD picks a single data point at each iteration, calculates the gradient for just that point, and takes a step.

12
New cards

How does gradient descent know when to stop taking steps?

It stops taking steps when step size is very close to 0.