Regression Models - Predictive Analytics Final Exam

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/53

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

54 Terms

New cards

What is the difference between predictive analytics and statistics?

Predictive analytics predicts future outcomes to aid in decision making, whereas statistics describes, explains, and infers relationships within data

New cards

What is overfitting?

The model learns the training data too well - including its noise - and it fails to generalize to new data

New cards

What happens if a model is overfitting?

The model will perform great on training data but poorly on test or unseen data (it tries to memorize instead of understanding the patterns)

New cards

T/F: Linear Regression = “Good Fit”

True

New cards

T/F: Polynomial Regression = “Over fit”

True

New cards

Do overfitted models use too many predictors?

Yes and this causes poor model performance

New cards

How do we avoid overfitting initially when creating a model?

We split our data into training, validation, and test sets

New cards

What is simple linear regression?

A model used to predict one variable (y) using another variable (x)

New cards

What is the simple linear regression equation?

Y = B0 + B1(X) + e

New cards

What does Beta 0 represent in the simple linear regression equation?

It is the intercept, which is the value of y when x = 0

New cards

What does Beta 1 represent in the simple linear regression equation?

It is the slope describing how much y changes when x increases by 1 unit

New cards

What is the "e" in the simple linear regression equation?

(error) The leftover part that cannot be explained by x. Can be random noise or other factors

New cards

What does it mean to "fit a line"?

Finding the straight line that best represents the relationship between two variables

New cards

When fitting a line, what should you minimize?

The total vertical distance between each data point and the line

New cards

Why do we choose the MSE when fitting a regression line?

It makes the prediction errors as small as possible, squaring the errors removes negatives

New cards

What does it mean to measure predictive error?

Focuses on how well the model predicts new data, not just how well it fits the training data

New cards

In a good regression model, how should the residuals (errors) look?

They should be symmetrically distributed around zero, meaning the model doesn't under/over fit

New cards

(True/False) If residuals (errors) are skewed, then the model is considered biased.

True

New cards

Can we use regression to help fill in missing values in a data set?

Yes, the variable with the missing values will become the dependent variable while using complete cases to build the regression equation

New cards

What happens when an interaction effect is present?

The impact of one factor depends on the level of the other factor (synergy)

New cards

What is the formula of a model before adding an interaction term?

Y = B0 + B1X1 + B2X2 + error

- where x1 and x2 each affect y independently
- x2 doesn't change the effect of x1 yet

New cards

What is the formula of a model after adding an interaction term?

Y = B0 + B1X1 + B2X2 + B3X1X2 + error

- x1 and x2 interact; their effects on y depend on each other
- changing x2 changes how x1 affects y

New cards

What is a non-linear relationship called?

Polynomial Regression

New cards

Degree 0 (polynomial regression) is a…

constant line

New cards

Degree 1 (polynomial regression) is a…

straight line

New cards

Degree 3 (polynomial regression) is…

more complex curve (S-shaped)

New cards

(True/False) Dummy variables are also called "indicator variables."

True

New cards

Can regression models handle more than just numbers?

No, so we turn non-numerical outcomes (categorical outcomes) into dummy variables

New cards

How many fewer dummy variables do we need than the number of categories?

One

New cards

(True/False) One group is always the reference group, where all dummy variables equal zero.

True

New cards

What are the four assumptions of linear regression models?

1. Linearity of Residuals
2. Normal Distribution of Residuals
3. Equal Variance of Residuals
4. Independence of Residuals

New cards

What does high variance equal?

Overfitting

New cards

What is bias?

The error from making the model too simple (underfitting)

New cards

What is an example of bias?

Using a straight line when the real relationship is curved

New cards

What is variance?

Error from making the model too complex (overfitting)

New cards

What is an example of variance?

Using a wiggly line that fits every training point perfectly but fails on new data

New cards

The Tradeoff: If you have a simple model, there is...

high bias, low variance

New cards

The Tradeoff: If you have a complex model, there is...

high variance, low bias

New cards

What is the "sweet spot" of a model?

A model that is complex enough to capture the real structure, but simple enough to generalize to new data.

New cards

What is validity in a regression model?

A model that measures what it's supposed to (ex: a job performance model uses skills or experience and not favorite color)

New cards

What are the traits of linearity of residuals?

1. You plot residuals vs predicted values, and they look like a random cloud around zero
2. A U-Shape indicates non-linearity

New cards

What are the traits of the Normal Distribution of residuals?

1. Residuals centered around zero (bell-shaped)
2. If the points curve, the residuals are not normal
3. The data itself does not have to be normal, only the residuals

New cards

What are the traits of Equal Variance of residuals?

The spread of errors stays the same no matter what the prediction is.

New cards

What is homoscedasticity?

The residuals (errors) are evenly scattered across all levels of the predicted values, not getting wider or narrower as predictions change

New cards

If residuals are in a flat and random pattern that is...

homoscedasticity (good)

New cards

Is residuals are in a curve pattern that is...

heteroscedasticity (bad)

New cards

What are the traits of Independence of residuals?

The spread of residuals should be independent, where one observation's error doesn't affect another

New cards

Where are you most likely to see independence of residuals?

In time series or repeated measures data, where values are collected from the same source over time

New cards

What is Cook's Distance?

Measures how much a single data point influences the overall regression model (e.g., large values mean that the point has a strong effect on the fitted line)

New cards

What is cross-validation?

1. Tests how well a model performs on unseen data
2. The dataset is split into K parts, and the model is trained on K-1 parts
3. Process repeats K times
4. helps avoid overfitting
5. useful with small datasets

New cards

How do you calculate error?

E = actual - predicted value

New cards

Degree 2 (polynomial regression)

curve (u-shaped or inverted U)

New cards

What does high bias mean in R?

Your model is underfitting because it is not flexible enough to capture the true pattern

New cards

Why do we need to check the assumptions?

To get unbiased estimates, makes sure to model will perform well in prediction, and the inferences are accurate (test hypotheses, confidence interval, etc.)