linear regression

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Knowt Play

New

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

13 Terms

New cards

definition

supervised learning problem, given input variables + outputs Y, goal of LR = learn function that can predict an output given an input

find the best line y = f(X) to explain the data

simplest LR

x = input feature
y = value we’re trying to predict
regression model → y = w₁x +w₀
- 2 parameters to estimate: the slope of the line w₁ and y intercept w₀

basically want to find {w_0,w₁} that minimise derivations from the predictor line

min w_0,w₁∑ (y_i - w₁x_i- w₀)

<p>supervised learning problem, given input variables + outputs Y, goal of LR = learn function that can predict an output given an input</p><p></p><ul><li><p>find the best line y = f(X) to explain the data</p></li></ul><p></p><p>simplest LR</p><ul><li><p>x = input feature</p></li><li><p>y = value we’re trying to predict</p></li><li><p>regression model → y = w<sub>1</sub>x +w<sub>0</sub></p></li><li><p>- 2 parameters to estimate: the slope of the line w<sub>1</sub> and y intercept w<sub>0</sub></p></li></ul><p></p><p>basically want to find {w<sub>0, </sub>w<sub>1</sub>} that minimise derivations from the predictor line</p><ul><li><p>min w<sub>0,</sub>w<sub>1 </sub><span>∑ (y<sub>i</sub> - </span>w<sub>1</sub>x<sub>i </sub>- w<sub>0</sub>)<sub> </sub></p></li></ul><p></p>

New cards

LR function model

function f: X → Y = linear combo of input components

f(x) = w₀+ w₁x₁ + w₂×₂ + … + w_dx_d= w₀ + ∑ w_jx_j

w0, w1, wd = parameters (weights)

Input vector: 𝐱 = [1, 𝑥1, 𝑥2, ... 𝑥𝑑]

f(x) = f(x) = w₀+ w₁x₁ + w₂×₂ + … + w_dx_d= w^Tx

New cards

loss func

measures how much out predictions deviate from the desired answers

Mean square loss:

1/2n ∑ (yi - f(xi))² = 1/2n ∑ (yi - w^Txi)²

learning - find weights minimising the loss

New cards

solving LR

using gradient descent:

𝐰←𝐰−𝜂⋅∇ 𝐽_n(𝐰)

𝐰←𝐰−𝜂⋅ (−1/𝑛 ∑ (𝑦𝑖−𝐰^T𝐱𝑖) 𝐱𝑖)

𝐰←𝐰−𝜂⋅1/𝑛 ∑ (𝐰^T𝐱𝑖 −𝑦𝑖) 𝐱𝑖

New cards

online LR

the loss function defined for the whole dataset for LR =

1/2n ∑ (𝑦𝑖 - f(𝐱𝑖))²

online GD - use most recent sample at each iteration

instead of mean squared loss, use squared loss for individual sample

𝐿𝑜𝑠𝑠_𝑖(𝐰) = ½ ∑ (𝑦𝑖 - f(𝐱𝑖))²

𝐰 ← 𝐰 − 𝜂 ⋅ ∇𝐰 𝐿𝑜𝑠𝑠_𝑖(𝐰)

𝐰←𝐰−𝜂⋅(𝑓(𝐱𝑖) −𝑦𝑖) ⋅𝐱𝑖

<p>the loss function defined for the whole dataset for LR =</p><ul><li><p>1/2n ∑ (𝑦𝑖 - f(𝐱𝑖))²</p></li></ul><p></p><p>online GD - use most recent sample at each iteration</p><ul><li><p>instead of mean squared loss, use squared loss for individual sample</p></li></ul><p></p><p>𝐿𝑜𝑠𝑠<sub>𝑖</sub>(𝐰) = ½ ∑ (𝑦𝑖 - f(𝐱𝑖))²</p><p></p><p><span>𝐰 ← 𝐰 − 𝜂 ⋅ ∇𝐰 𝐿𝑜𝑠𝑠</span><sub>𝑖</sub><span>(𝐰) </span></p><p><span>𝐰←𝐰−𝜂⋅(𝑓(𝐱𝑖) −𝑦𝑖) ⋅𝐱𝑖</span></p>

New cards

input normalisation

makes data vary roughly on the same scale

can make a huge diff in online learning

𝐰←𝐰−𝜂⋅(𝑓(𝐱𝑖) −𝑦𝑖) ⋅𝐱𝑖

for inputs w large magnitude, the change in weight = huge

solution: makes all inputs vary in same range

𝑥 bar j =1/n ∑ xi,j

𝜎² j = 1/ (n-1) ∑ (xij- xbarj)²

new output:

= xij - xbarj / 𝜎j

<p>makes data vary roughly on the same scale</p><ul><li><p>can make a huge diff in online learning </p></li></ul><p></p><p><span>𝐰←𝐰−𝜂⋅(𝑓(𝐱𝑖) −𝑦𝑖) ⋅𝐱𝑖</span></p><p></p><p>for inputs w large magnitude, the change in weight = huge</p><ul><li><p>solution: makes all inputs vary in same range</p></li></ul><p></p><p><span>𝑥 bar j =1/n ∑ xi,j</span></p><p><span>𝜎² j = 1/ (n-1)</span> ∑ (xij<span>- xbarj)² </span></p><p></p><p><span>new output:</span></p><p><span>= xij - xbarj / </span>𝜎j</p><p></p>

New cards

L1/L2 regularisation

using L1/L2 regularisation, can rewire loss function as:

L_lasso = 1/2n ∑ (yi - f(w^Txi))² + 𝜆 ||w||₁

Bridge = 1/2n ∑ (yi - f(w^Txi))² + 𝜆 ||w||₂²

New cards

fitting the data

R² metric determines how well the learned model fits the dataa

R² captures the fraction of the total variance explained by the model

let ŷi = predicted value, and y bar = sample mean, R² =

1- residual variance/total variance = 1 - ∑ (yi - ŷi)²/ ∑ (yi - ybar)²

lower residual variance = predicted vans closer to actual values = good model

$R² metric determines how well the learned model fits the dataaR² captures the fraction of the total variance explained by the modellet ŷi = predicted value, and y bar = sample mean, R² =<ul><li>1- residual variance/total variance = 1 -  ∑ (yi - ŷi)²/ ∑ (yi - ybar)² </li></ul>lower residual variance = predicted vans closer to actual values = good model$

New cards

bias variance tradeoff

bias captures inherent error present in model

bias error comes from erroneous assumptions in learning algo

bias = contrast between mean prediction of out model + correct prediction

variance captures how much model changes if trained on a diff training set

variance = variation / spread of model prediction values across diff data sampling

bias-variance - conflict in trying to minimise both at same time - these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set

New cards