Multiple linear Regression

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/63

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

64 Terms

New cards

The Regression process

Check if dependant variable is continuous
Estimate the model
Analyse residuals
Check is the assumptions are verified
examine the goodness of fit model
Is the overall fit significant
Is the model the best

New cards

5 assumptions of multiple linear regression

Linearity
homoskedacity
Independance of errors
Normality of residuals
No collinearity

New cards

Scatterplot matrix

Helps to better understand the model

New cards

Scatter plot of residual against dependant variable

detect outliers

New cards

Scatter plot of residual against dependant variable

confirm outliers and look for broken assumptions

New cards

normal q-q plot

Compare the distribution of the residuals to a normal distribution

New cards

Partial regression coefficient

Coefficient that describes the effect of a one-unit change in the independent variable on the dependent variable, holding all other independent variables constant.

New cards

If a regression model is estimated using all five independent variables

any prediction of the dependent variable must also include all five variables

New cards

R²

Sum of square regression / Sum of square total

New cards

Evolution of R² When you add variables

R² can not decrease

New cards

Problems with R²

R² Do not say wether coefficient are statistically significant
R² do not say anything about biases
R² do not say if the model is a good fit

New cards

Overfitting

model is too complex, meaning there may be too many independent variables relative to the number of observations in the sample

New cards

Adjusted R²

New cards

What does it mean if adjusted R² increase or decrease

new coefficient t stat is superior / inferior to 1

New cards

Who is bigger R² or adjusted R²

R²

New cards

AIC

New cards

BIC

BIC impose a greater penalty on more complex model

New cards

When do you use AIC or BIC

AIC is for prediction
BIC is for testing fit

New cards

Nested model

Models in which one regression model has a subset of the independent variables of another regression model.

New cards

F stat for restricted model

New cards

Summary Model Fit

New cards

general linear F test

New cards

R² and adjusted R² are not generally suitable for testing the significance of the model’s fit; for this, we explore the ANOVA further, calculating the F-statistic and other goodness-of-fit metrics.

New cards

Principles of model specification

Grounded in economic reasoning
Parsimonious
Perform on other samples
Model should adapt to non linearity
Model should satisfy regression assumptions

New cards

Failures in regressions

Omitted variables
Innapropriate form of variables
Inaproprate variable scaling
Innapropriate data pooling

New cards

Heteroskedacity

Variance of residuals differs across observation

New cards

Interpretation of Breusch Pagan test

the null hypothesis is that there is not heteroskedacity. This is a one tail risght side test.

New cards

What model should you use if there is heteroskedacity

Robust standards errors

New cards

Serial Correlation

Often found in time series. Residual are correlated

New cards

Impact of Serial Correlation on Multiple regression model

New cards

First order serial correlation

The correlation is about the adjacent residuals

New cards

VIF interpretation

VIF > 5 so investigate
VIF > 10 serious multicolinearity

New cards

How you correct multicollinearity

Exclude variables
Using a different proxy
Increasing the sample size

New cards

high leverage point

An observation of an independent variable that has an extreme value and is potentially influential.

New cards

An outlier

An observation that has an extreme value of the dependent variable and is potentially influential.

New cards

How to identify ouliers and high leverage points

Scatterplot

New cards

Leverage

A value between 0 & 1 to identify how far is the high leverage point (1 being the highest)

New cards

How to detect high leverage point

Higher than 3 x (K+1/N)

New cards

Method to identify outliers

Studentised residuals

New cards

degree of freedom for studentised residuals

n-k-2

New cards

Conclusion of studentised resiudals

if studentised residuals are higher than Critical value then it is an outlier

New cards

Number of dummy variable for N categories

N-1

New cards

Qualitative dependant Variables

New cards

logistic regression

The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.

New cards

Model used to estimate coefficient in logisitic regression

Maximum likelyhood ratio

New cards

error distribution for logistic regression

the distribution’s shape is similar to the normal distribution but with fatter tails.

New cards

how to assess logistic model fit

Likelyhood ratio test

LR = −2 × (Log-likelihood restricted model − Log-likelihood unrestricted model)

New cards

Interpratting log likelyhood test

log-likelihood metric is always negative, so higher values (closer to 0) indicate a better-fitting model.

New cards

General linear F test

New cards