EGM 101 - Week 6, Lecture 2 : The Coefficient of Determination, Outliers, and Interpolation

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

Correlation

Assessing linear relationships between variables

2
New cards

Regression

Fitting linear models to observations

3
New cards

Variability of each individual y-value (yi)

Difference between y-value and mean value of y ( = yi - y̅)

4
New cards

Coefficient of Determination (R²)

Portion of total variability accounted for by the model

  • Often expressed as a percent

  • Best case (perfect fit) : R² = 1 (100%)

  • Is possible to have R² < 0

5
New cards

Coefficient of Determination (Simple Linear Regression Case)

R² is the square of Pearson’s Correlation; R² = r²

6
New cards

What R² Says

  • What R² tells us

    • Scatter of data points around the best-fit line

    • Proportion of variability of dependent variable explained by the independent variable

  • What R² does not tell us

    • How good the model is

7
New cards

High R² does not mean it’s the right model? (true/false)

True

8
New cards

Random Errors

No apparent pattern in the errors

9
New cards

Systematic Bias

Non-random errors (pattern); indicated variability not accounted for in the model (bad fit)

10
New cards

What’s a Good R²?

  • Depends on context, goal

    • Understanding relationship between variables

    • Predicting unknown values

  • Other questions

    • How much of variability can be explained?

    • Is the relationship statistically significant?

    • How precise are the predictions of the model?

11
New cards

Outlier

Values that lie far away from the rest of the data

12
New cards

Large Outliers

Tend to “pull” the regression line toward themselves

13
New cards

Basic Rule of Thumb for Identifying Outliers

Points further than two standard deviations from regression line are (probably) outliers

  • standard deviation of residuals, not the observations!

  • instead of n – 1, we divide by n – 2 when estimating this standard deviation

14
New cards

Handling Outliers

First understand why it’s an outlier

  • Measurement/data entry error?

  • An actual difference?

    • Not part of the population

    • Just an extreme value

15
New cards

Interpolation

Estimating unknown values of response variable within the range of observed (x) values

16
New cards

Extrapolation

Estimating unknown values of response variable outside of the range of observed (x) values

17
New cards

Dangers of Extrapolation

  • Interpolation: less likely that new observations completely contradict regression

  • Can only be sure about “shape” of the relationship within the range of our observations

  • Outside of this range, we don’t know what we don’t know

    • Relationship could be non-linear

    • Could lead to ridiculous conclusions

18
New cards

When is it Okay to Extrapolate?

  • It depends on what you are trying to model

    • Reasonable example: the sun will come out tomorrow

    • Unreasonable example: number of husbands over time.

  • In general, it’s good to have some kind of theoretical basis for your model first.