1/10
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
When is linear regression appropriate?
When:
The target variable is continuous (numeric)
You assume a linear relationship between predictors and response
You want interpretable relationships (slope effects)
Example:
Predicting house price from size
Predicting exam score from study hours
What is the difference between simple and multivariable linear regression?
Simple linear regression: 1 predictor
→ y = β0 + β1x
where
β0 is the vertical intercept of the line
β1 is the slope of the line
Multivariable linear regression: multiple predictors
→ y = β0 + β1x1 + β2x2 + ...
where:
β0 is the vertical intercept of the hyperplane
β1 is the slope for the first predictor
β2 is the slope for the second predictor
How do you fit a linear regression model in R (tidymodels)?
lm_spec <- linear_reg() |>
set_engine("lm") |>
set_mode("regression")
lm_recipe <- recipe(outcome ~ predictor, data = training_data)
lm_fit <- workflow() |>
add_recipe(recipe) |>
add_model(lm_spec) |>
fit(data = training_data)
How do you evaluate linear regression performance?
Use:
RMSPE (Root Mean Squared Prediction Error)
Compute on test data only
Steps:
Predict on test set
Compare predictions to actual values
Compute RMSPE
Lower RMSPE = better model
How do K-NN and linear regression differ?
Feature | K-NN | Linear Regression |
|---|---|---|
Type | Non-parametric | Parametric |
Shape | Flexible, local | Linear global trend |
Interpretability | Low | High |
Sensitivity | Local noise | Outliers affect slope |
How do outliers affect linear regression?
Strongly influence the slope
Pull the regression line toward extreme values
Distort predictions
What is multicollinearity and why is it a problem?
Multicollinearity occurs when:
Predictors are highly correlated with each other
Problems:
Coefficients become unstable
Hard to interpret individual predictor effects
Model becomes less reliable
For Example:
house_size_sqft
number_of_rooms
These are highly correlated because bigger houses usually have more rooms → this creates multicollinearity.
Fill in the blanks: Linear regression specification
lm_spec <- linear_reg() |>
set_engine("___") |>
set_mode("__________")
“lm”, “regression”
Fill in the blanks: Workflow for linear regression
lm_fit <- workflow() |>
add_recipe(__________) |>
add_model(__________) |>
fit(data = __________)
recipe, lm_spec, training_data
Fill in the blanks: Making predictions
predictions <- predict(lm_fit, new_data = __________) |>
bind_cols(__________)
test_data, test_data
Fill in the blanks: RMSPE calculation
rmspe <- sqrt(mean((__________ - __________)^2))
predicted, actual