Multiple linear Regression

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/63

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

64 Terms

1
New cards

The Regression process

  • Check if dependant variable is continuous

  • Estimate the model

  • Analyse residuals

  • Check is the assumptions are verified

  • examine the goodness of fit model

  • Is the overall fit significant

  • Is the model the best

2
New cards

5 assumptions of multiple linear regression

  • Linearity

  • homoskedacity

  • Independance of errors

  • Normality of residuals

  • No collinearity

3
New cards

Scatterplot matrix

Helps to better understand the model

4
New cards

Scatter plot of residual against dependant variable

detect outliers

5
New cards

Scatter plot of residual against dependant variable

confirm outliers and look for broken assumptions

6
New cards

normal q-q plot

Compare the distribution of the residuals to a normal distribution

7
New cards

Partial regression coefficient

Coefficient that describes the effect of a one-unit change in the independent variable on the dependent variable, holding all other independent variables constant.

8
New cards

If a regression model is estimated using all five independent variables

any prediction of the dependent variable must also include all five variables

9
New cards

Sum of square regression / Sum of square total

10
New cards

Evolution of R² When you add variables

R² can not decrease

11
New cards

Problems with R²

  • R² Do not say wether coefficient are statistically significant

  • R² do not say anything about biases

  • R² do not say if the model is a good fit

12
New cards

Overfitting

model is too complex, meaning there may be too many independent variables relative to the number of observations in the sample

13
New cards

Adjusted R²

14
New cards

What does it mean if adjusted R² increase or decrease

new coefficient t stat is superior / inferior to 1

15
New cards

Who is bigger R² or adjusted R²

16
New cards

AIC

17
New cards

BIC

BIC impose a greater penalty on more complex model

18
New cards

When do you use AIC or BIC

  • AIC is for prediction

  • BIC is for testing fit

19
New cards

Nested model

Models in which one regression model has a subset of the independent variables of another regression model.

20
New cards

F stat for restricted model

21
New cards

Summary Model Fit

22
New cards

general linear F test

23
New cards

R2 and adjusted R2 are not generally suitable for testing the significance of the model’s fit; for this, we explore the ANOVA further, calculating the F-statistic and other goodness-of-fit metrics.

ok

24
New cards

Principles of model specification

  • Grounded in economic reasoning

  • Parsimonious

  • Perform on other samples

  • Model should adapt to non linearity

  • Model should satisfy regression assumptions

25
New cards

Failures in regressions

  • Omitted variables

  • Innapropriate form of variables

  • Inaproprate variable scaling

  • Innapropriate data pooling

26
New cards

Heteroskedacity

Variance of residuals differs across observation

27
New cards
28
New cards
29
New cards
30
New cards

Interpretation of Breusch Pagan test

the null hypothesis is that there is not heteroskedacity. This is a one tail risght side test.

31
New cards

What model should you use if there is heteroskedacity

Robust standards errors

32
New cards

Serial Correlation

Often found in time series. Residual are correlated

33
New cards

Impact of Serial Correlation on Multiple regression model

34
New cards

First order serial correlation

The correlation is about the adjacent residuals

35
New cards
36
New cards
37
New cards
38
New cards
39
New cards
40
New cards

VIF interpretation

VIF > 5 so investigate
VIF > 10 serious multicolinearity

41
New cards

How you correct multicollinearity

  • Exclude variables

  • Using a different proxy

  • Increasing the sample size

42
New cards

high leverage point

An observation of an independent variable that has an extreme value and is potentially influential.

43
New cards

An outlier

An observation that has an extreme value of the dependent variable and is potentially influential.

44
New cards

How to identify ouliers and high leverage points

Scatterplot

45
New cards

Leverage

A value between 0 & 1 to identify how far is the high leverage point (1 being the highest)

46
New cards

How to detect high leverage point

Higher than 3 x (K+1/N)

47
New cards

Method to identify outliers

Studentised residuals

48
New cards

degree of freedom for studentised residuals

n-k-2

49
New cards

Conclusion of studentised resiudals

if studentised residuals are higher than Critical value then it is an outlier

50
New cards

Number of dummy variable for N categories

N-1

51
New cards

Qualitative dependant Variables

52
New cards

logistic regression

The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.

53
New cards

Model used to estimate coefficient in logisitic regression

Maximum likelyhood ratio

54
New cards

error distribution for logistic regression

the distribution’s shape is similar to the normal distribution but with fatter tails.

55
New cards

how to assess logistic model fit

Likelyhood ratio test

LR = −2 × (Log-likelihood restricted model − Log-likelihood unrestricted model)

56
New cards

Interpratting log likelyhood test

log-likelihood metric is always negative, so higher values (closer to 0) indicate a better-fitting model.

57
New cards

General linear F test

knowt flashcard image
58
New cards
59
New cards
60
New cards
61
New cards
62
New cards
63
New cards
64
New cards