EGM 101 - Week 6, Lecture 2 : The Coefficient of Determination, Outliers, and Interpolation

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 17

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

18 Terms

1

Correlation

Assessing linear relationships between variables

New cards
2

Regression

Fitting linear models to observations

New cards
3

Variability of each individual y-value (yi)

Difference between y-value and mean value of y ( = yi - y̅)

New cards
4

Coefficient of Determination (R²)

Portion of total variability accounted for by the model

  • Often expressed as a percent

  • Best case (perfect fit) : R² = 1 (100%)

  • Is possible to have R² < 0

New cards
5

Coefficient of Determination (Simple Linear Regression Case)

R² is the square of Pearson’s Correlation; R² = r²

New cards
6

What R² Says

  • What R² tells us

    • Scatter of data points around the best-fit line

    • Proportion of variability of dependent variable explained by the independent variable

  • What R² does not tell us

    • How good the model is

New cards
7

High R² does not mean it’s the right model? (true/false)

True

New cards
8

Random Errors

No apparent pattern in the errors

New cards
9

Systematic Bias

Non-random errors (pattern); indicated variability not accounted for in the model (bad fit)

New cards
10

What’s a Good R²?

  • Depends on context, goal

    • Understanding relationship between variables

    • Predicting unknown values

  • Other questions

    • How much of variability can be explained?

    • Is the relationship statistically significant?

    • How precise are the predictions of the model?

New cards
11

Outlier

Values that lie far away from the rest of the data

New cards
12

Large Outliers

Tend to “pull” the regression line toward themselves

New cards
13

Basic Rule of Thumb for Identifying Outliers

Points further than two standard deviations from regression line are (probably) outliers

  • standard deviation of residuals, not the observations!

  • instead of n – 1, we divide by n – 2 when estimating this standard deviation

New cards
14

Handling Outliers

First understand why it’s an outlier

  • Measurement/data entry error?

  • An actual difference?

    • Not part of the population

    • Just an extreme value

New cards
15

Interpolation

Estimating unknown values of response variable within the range of observed (x) values

New cards
16

Extrapolation

Estimating unknown values of response variable outside of the range of observed (x) values

New cards
17

Dangers of Extrapolation

  • Interpolation: less likely that new observations completely contradict regression

  • Can only be sure about “shape” of the relationship within the range of our observations

  • Outside of this range, we don’t know what we don’t know

    • Relationship could be non-linear

    • Could lead to ridiculous conclusions

New cards
18

When is it Okay to Extrapolate?

  • It depends on what you are trying to model

    • Reasonable example: the sun will come out tomorrow

    • Unreasonable example: number of husbands over time.

  • In general, it’s good to have some kind of theoretical basis for your model first.

New cards
robot