Regression Models - Predictive Analytics Final Exam

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/53

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

54 Terms

1
New cards

What is the difference between predictive analytics and statistics?

Predictive analytics predicts future outcomes to aid in decision making, whereas statistics describes, explains, and infers relationships within data

2
New cards

What is overfitting?

The model learns the training data too well - including its noise - and it fails to generalize to new data

3
New cards

What happens if a model is overfitting?

The model will perform great on training data but poorly on test or unseen data (it tries to memorize instead of understanding the patterns)

4
New cards

T/F: Linear Regression = “Good Fit”

True

5
New cards

T/F: Polynomial Regression = “Over fit”

True

6
New cards

Do overfitted models use too many predictors?

Yes and this causes poor model performance

7
New cards

How do we avoid overfitting initially when creating a model?

We split our data into training, validation, and test sets

8
New cards

What is simple linear regression?

A model used to predict one variable (y) using another variable (x)

9
New cards

What is the simple linear regression equation?

Y = B0 + B1(X) + e

10
New cards

What does Beta 0 represent in the simple linear regression equation?

It is the intercept, which is the value of y when x = 0

11
New cards

What does Beta 1 represent in the simple linear regression equation?

It is the slope describing how much y changes when x increases by 1 unit

12
New cards

What is the "e" in the simple linear regression equation?

(error) The leftover part that cannot be explained by x. Can be random noise or other factors

13
New cards

What does it mean to "fit a line"?

Finding the straight line that best represents the relationship between two variables

14
New cards

When fitting a line, what should you minimize?

The total vertical distance between each data point and the line

15
New cards

Why do we choose the MSE when fitting a regression line?

It makes the prediction errors as small as possible, squaring the errors removes negatives

16
New cards

What does it mean to measure predictive error?

Focuses on how well the model predicts new data, not just how well it fits the training data

17
New cards

In a good regression model, how should the residuals (errors) look?

They should be symmetrically distributed around zero, meaning the model doesn't under/over fit

18
New cards

(True/False) If residuals (errors) are skewed, then the model is considered biased.

True

19
New cards

Can we use regression to help fill in missing values in a data set?

Yes, the variable with the missing values will become the dependent variable while using complete cases to build the regression equation

20
New cards

What happens when an interaction effect is present?

The impact of one factor depends on the level of the other factor (synergy)

21
New cards

What is the formula of a model before adding an interaction term?

Y = B0 + B1X1 + B2X2 + error

- where x1 and x2 each affect y independently
- x2 doesn't change the effect of x1 yet

22
New cards

What is the formula of a model after adding an interaction term?

Y = B0 + B1X1 + B2X2 + B3X1X2 + error

- x1 and x2 interact; their effects on y depend on each other
- changing x2 changes how x1 affects y

23
New cards

What is a non-linear relationship called?

Polynomial Regression

24
New cards

Degree 0 (polynomial regression) is a…

constant line

25
New cards

Degree 1 (polynomial regression) is a…

straight line

26
New cards

Degree 3 (polynomial regression) is…

more complex curve (S-shaped)

27
New cards

(True/False) Dummy variables are also called "indicator variables."

True

28
New cards

Can regression models handle more than just numbers?

No, so we turn non-numerical outcomes (categorical outcomes) into dummy variables

29
New cards

How many fewer dummy variables do we need than the number of categories?

One

30
New cards

(True/False) One group is always the reference group, where all dummy variables equal zero.

True

31
New cards

What are the four assumptions of linear regression models?

1. Linearity of Residuals
2. Normal Distribution of Residuals
3. Equal Variance of Residuals
4. Independence of Residuals

32
New cards

What does high variance equal?

Overfitting

33
New cards

What is bias?

The error from making the model too simple (underfitting)

34
New cards

What is an example of bias?

Using a straight line when the real relationship is curved

35
New cards

What is variance?

Error from making the model too complex (overfitting)

36
New cards

What is an example of variance?

Using a wiggly line that fits every training point perfectly but fails on new data

37
New cards

The Tradeoff: If you have a simple model, there is...

high bias, low variance

38
New cards

The Tradeoff: If you have a complex model, there is...

high variance, low bias

39
New cards

What is the "sweet spot" of a model?

A model that is complex enough to capture the real structure, but simple enough to generalize to new data.

40
New cards

What is validity in a regression model?

A model that measures what it's supposed to (ex: a job performance model uses skills or experience and not favorite color)

41
New cards

What are the traits of linearity of residuals?

1. You plot residuals vs predicted values, and they look like a random cloud around zero
2. A U-Shape indicates non-linearity

42
New cards

What are the traits of the Normal Distribution of residuals?

1. Residuals centered around zero (bell-shaped)
2. If the points curve, the residuals are not normal
3. The data itself does not have to be normal, only the residuals

43
New cards

What are the traits of Equal Variance of residuals?

The spread of errors stays the same no matter what the prediction is.

44
New cards

What is homoscedasticity?

The residuals (errors) are evenly scattered across all levels of the predicted values, not getting wider or narrower as predictions change

45
New cards

If residuals are in a flat and random pattern that is...

homoscedasticity (good)

46
New cards

Is residuals are in a curve pattern that is...

heteroscedasticity (bad)

47
New cards

What are the traits of Independence of residuals?

The spread of residuals should be independent, where one observation's error doesn't affect another

48
New cards

Where are you most likely to see independence of residuals?

In time series or repeated measures data, where values are collected from the same source over time

49
New cards

What is Cook's Distance?

Measures how much a single data point influences the overall regression model (e.g., large values mean that the point has a strong effect on the fitted line)

50
New cards

What is cross-validation?

1. Tests how well a model performs on unseen data
2. The dataset is split into K parts, and the model is trained on K-1 parts
3. Process repeats K times
4. helps avoid overfitting
5. useful with small datasets

51
New cards

How do you calculate error?

E = actual - predicted value

52
New cards

Degree 2 (polynomial regression)

curve (u-shaped or inverted U)

53
New cards

What does high bias mean in R?

Your model is underfitting because it is not flexible enough to capture the true pattern

54
New cards

Why do we need to check the assumptions?

To get unbiased estimates, makes sure to model will perform well in prediction, and the inferences are accurate (test hypotheses, confidence interval, etc.)