9. Linear Regression

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/42

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

43 Terms

1
New cards

Other Names for Response Variable 

Dependent Variable

Target Variable

Output Variable

2
New cards

Other Names for Explanatory Variable

Regressor

Independent Variable

Predictor Variable

Input Variable

Covariate

3
New cards

Simple Linear Regression Relationship

  • \beta0 is the Y-intercept and \beta1 is the slope of the line.

  • \varepsilon is the error, a random variable with constant variance. ( For population)

<ul><li><p>$$\beta0$$ is the Y-intercept and $$\beta1$$ is the slope of the line.</p></li><li><p>$$\varepsilon$$ is the error, a random variable with constant variance. ( For population)</p></li></ul><p></p>
4
New cards

Assumptions for Linear Regression

  • Data were obtained by randomization

  • Relationship between X and Y is linear (Scatter Plot)

  • \varepsilon must be normally distributed with mean of 0 and constant variance

<ul><li><p>Data were obtained by randomization</p></li><li><p>Relationship between X and Y is linear (Scatter Plot)</p></li><li><p>$$\varepsilon$$&nbsp;must be normally distributed with mean of 0 and constant variance</p></li></ul><p></p>
5
New cards

Error of the linear regression

stays constsant for any values of X

<p>stays constsant for any values of X</p>
6
New cards

Ordinary Least Square Estimation

Square the difference between the actual and predicted response variable and compute the sum of them.

7
New cards

R Linear Model

lm = linear model

Response Variable: Selling_Price

Predictor Variable: Present_Price

\beta0 = 0.72, \beta1 = 0.52

y (hat) = 0.72 + 0.52x

<p>lm = linear model</p><p>Response Variable: Selling_Price</p><p>Predictor Variable: Present_Price</p><p>$$\beta0$$ = 0.72, $$\beta1$$ = 0.52</p><p>y (hat) = 0.72 + 0.52x</p>
8
New cards

Point Estimation in R 

knowt flashcard image
9
New cards

Interpolation

Estimating the mean response for an X value that had not been observed, but is within the ranged of observed values

10
New cards

Extrapolation

Estimating the mean response for an X value that is not within the range of observed values

We do not know the form of the relationship outside of our sample, so we shuold avoid

11
New cards

Point Estimate of Varariance

knowt flashcard image
12
New cards

Interval Estimates

Point estimate +- margin of error (Quantile * SE of point estimate)

Quantile = t distribution, df = n-2

13
New cards

Interval Estimates in R for Coefficient

knowt flashcard image
14
New cards

interval Estimates in R

knowt flashcard image
15
New cards

T Test

Test significance of one regressor (coefficient) can also be the independent test for one regressor and the response variable

16
New cards

R Output T-Test

For a simple linear regression model, the T test is equivalent to F test 

<p>For a simple linear regression model, the T test is equivalent to F test&nbsp;</p>
17
New cards

F Test

Test significance of the whole model

18
New cards

F Test Hypothesis

Null Hyphothesis: Model is not significant
Alternative Hypothesis: Model is significant

19
New cards

R Output F-Test

R formula for P distribution: pf(1016,1,299,lower.tail = F)

Df1 = No of Coefficents = 1

Df2 = n - No of Coefficent - 1

<p>R formula for P distribution: pf(1016,1,299,lower.tail = F)</p><p>Df1 = No of Coefficents = 1</p><p>Df2 = n - No of Coefficent - 1</p>
20
New cards

When F-Test is not significant

All regressors are not significant and we should use a intercept model

21
New cards

R code for intercept model

knowt flashcard image
22
New cards

Regression Diagnostics

knowt flashcard image
23
New cards

Checking Linearity and Variance

Use Scatter plot and draw top and bottom lines

  1. Fix X via adding higher order element

  2. Transform y by doing log(y) or 1/y

<p>Use Scatter plot and draw top and bottom lines</p><ol start="2"><li><p>Fix X via adding higher order element</p></li><li><p>Transform y by doing log(y) or 1/y</p></li></ol><p></p>
24
New cards

Residual Plots

  • Check the normality of the assumption

  • Check for non-constant variance and the need to transform Y

  • Check for the need to add higer order terms in X

25
New cards

Standardised Residuals

The distribution should follow a standard distribution since the residuals follows a normal distribution.

<p>The distribution should follow a standard distribution since the residuals follows a normal distribution.</p>
26
New cards

R output Residuals

knowt flashcard image
27
New cards

Checking Normality of Residuals

Creating a histogram plot or QQ-Plot of the Standardised Residuals and checking if they are in a standard distribution

28
New cards

Analysing Residual Plots

  • Plot SR against Y and X: Scattered around 0 within (3,-3)

  • Histogram and QQ plot of SR Normally Distirbuted

  • SR from fitted model are not independent but when sample size is large enough randomness should be seen

29
New cards

Common Issues in Residual Plots

Funnel in scatter plots

Curved band in X against Y

Non-normality in the QQ plot

<p>Funnel in scatter plots</p><p>Curved band in X against Y</p><p>Non-normality in the QQ plot</p>
30
New cards

R Output

Model does not satisfy the constant assumption and the normality assumption

<p>Model does not satisfy the constant assumption and the normality assumption</p>
31
New cards

Outliers 

  • Identified by the residuals

  • The standardised residuals greater than 3 or lesser than -3

  • Investigate outliers

32
New cards

Influential Point

A point that greatly affects the parameter’s estimate

Points with cooks distance > 1 is influential

<p>A point that greatly affects the parameter’s estimate</p><p>Points with cooks distance &gt; 1 is influential</p>
33
New cards
<p>Coefficient of Determination R²</p>

Coefficient of Determination R²

Check the goodness of fit of the model, between 0 and 1

<p>Check the goodness of fit of the model, between 0 and 1</p>
34
New cards

Simple Model Correlation

Equivalent to the square root of the coefficient of determination.

When correlation is negative, the equivalent relation is also negative.

35
New cards

R² weakness

The complexity of the model is not taken into consideration when explaining the goodness of fit of the model.

36
New cards

MLR vs SLR

  1. Method of significant tests for categorical variable with more than 2 categories

  2. Use adjusted R² to compare models

37
New cards

R Code for MLR

+ <Variable>

<p>+ &lt;Variable&gt;</p>
38
New cards

Adjusted R²

Takes into account the number of regressors included

K: number of regressors

n: number of samples

This enables us to compare the fit of 2 models with different number of variables

<p>Takes into account the number of regressors included </p><p>K: number of regressors</p><p>n: number of samples</p><p>This enables us to compare the fit of 2 models with different number of variables</p>
39
New cards

Indicator Variable

Changing categorical variables to integers

40
New cards

Variable vs Regressor

knowt flashcard image
41
New cards

R output for Indicator Variables

X2 cant be removed as the interaction term of X1 and X2 is highly significant. To keep the interaction term, the main terms must be kept.

<p>X2 cant be removed as the interaction term of X1 and X2 is highly significant. To keep the interaction term, the main terms must be kept.</p>
42
New cards

Checking Assumptions

Fitted model does not meet the assumptions

we can try to transform the variables or refit the model without the influential point

<p>Fitted model does not meet the assumptions</p><p>we can try to transform the variables or refit the model without the influential point</p>
43
New cards

Need to Know

  • Test for significance of regressor

  • Fit a model in R and to write down a fitted regression

  • Check assumptions of a regression analysis unsing residual plots

  • Identity outliers and influential points

  • Interpret coefficients and R²

  • Compare the fit of models for the same reponse using R²