9. Linear Regression

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/42

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

43 Terms

New cards

Other Names for Response Variable

Dependent Variable

Target Variable

Output Variable

New cards

Other Names for Explanatory Variable

Regressor

Independent Variable

Predictor Variable

Input Variable

Covariate

New cards

Simple Linear Regression Relationship

\beta0 is the Y-intercept and \beta1 is the slope of the line.
\varepsilon is the error, a random variable with constant variance. ( For population)

$<ul><li>$$\beta0$$ is the Y-intercept and $$\beta1$$ is the slope of the line.</li><li>$$\varepsilon$$ is the error, a random variable with constant variance. ( For population)</li></ul>$

New cards

Assumptions for Linear Regression

Data were obtained by randomization
Relationship between X and Y is linear (Scatter Plot)
\varepsilon must be normally distributed with mean of 0 and constant variance

$<ul><li>Data were obtained by randomization</li><li>Relationship between X and Y is linear (Scatter Plot)</li><li>$$\varepsilon$$ must be normally distributed with mean of 0 and constant variance</li></ul>$

New cards

Error of the linear regression

stays constsant for any values of X

New cards

Ordinary Least Square Estimation

Square the difference between the actual and predicted response variable and compute the sum of them.

New cards

R Linear Model

lm = linear model

Response Variable: Selling_Price

Predictor Variable: Present_Price

\beta0 = 0.72, \beta1 = 0.52

y (hat) = 0.72 + 0.52x

$lm = linear modelResponse Variable: Selling_PricePredictor Variable: Present_Price$$\beta0$$ = 0.72, $$\beta1$$ = 0.52y (hat) = 0.72 + 0.52x$

New cards

Point Estimation in R

New cards

Interpolation

Estimating the mean response for an X value that had not been observed, but is within the ranged of observed values

New cards

Extrapolation

Estimating the mean response for an X value that is not within the range of observed values

We do not know the form of the relationship outside of our sample, so we shuold avoid

New cards

Point Estimate of Varariance

New cards

Interval Estimates

Point estimate +- margin of error (Quantile * SE of point estimate)

Quantile = t distribution, df = n-2

New cards

Interval Estimates in R for Coefficient

New cards

interval Estimates in R

New cards

T Test

Test significance of one regressor (coefficient) can also be the independent test for one regressor and the response variable

New cards

R Output T-Test

For a simple linear regression model, the T test is equivalent to F test

New cards

F Test

Test significance of the whole model

New cards

F Test Hypothesis

Null Hyphothesis: Model is not significant
Alternative Hypothesis: Model is significant

New cards

R Output F-Test

R formula for P distribution: pf(1016,1,299,lower.tail = F)

Df1 = No of Coefficents = 1

Df2 = n - No of Coefficent - 1

New cards

When F-Test is not significant

All regressors are not significant and we should use a intercept model

New cards

R code for intercept model

New cards

Regression Diagnostics

New cards

Checking Linearity and Variance

Use Scatter plot and draw top and bottom lines

Fix X via adding higher order element
Transform y by doing log(y) or 1/y

New cards

Residual Plots

Check the normality of the assumption
Check for non-constant variance and the need to transform Y
Check for the need to add higer order terms in X

New cards

Standardised Residuals

The distribution should follow a standard distribution since the residuals follows a normal distribution.

New cards

R output Residuals

New cards

Checking Normality of Residuals

Creating a histogram plot or QQ-Plot of the Standardised Residuals and checking if they are in a standard distribution

New cards

Analysing Residual Plots

Plot SR against Y and X: Scattered around 0 within (3,-3)
Histogram and QQ plot of SR Normally Distirbuted
SR from fitted model are not independent but when sample size is large enough randomness should be seen

New cards

Common Issues in Residual Plots

Funnel in scatter plots

Curved band in X against Y

Non-normality in the QQ plot