1/14
Flashcards covering key concepts in regression analysis, including modeling, R-squared, model selection, and diagnostics.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Modeling
Explaining the behavior of an outcome variable using predictor variables; represented mathematically as ŷ = ẞ0 + ẞ₁ × ×1 + ẞ2 × ×2 + ẞ3 * X3.
Model Imperfection
All models are wrong, but some are useful; models are tools to understand phenomena better.
R-squared Value
Determine the proportion of variation in the outcome that was explained by the predictor; R² = 1 - (Variability in residuals / Variability in outcome) = 1 - (Var(e) / Var(y)).
Adjusted R-Squared
Taking into consideration model complexity, R² = 1 - (Var(e) / Var(y)) * ((n-1) / (n-k-1)), where n is sample size and k is the number of predictors and interactions.
Backwards Elimination
Start with the full model and eliminate terms that don't improve the model, repeating until the final model is reached.
Forward Selection
Start with an empty model and add terms that improve the model, repeating until the final model is reached.
Interaction Terms in Model Selection
If an interaction term is in the model, all variables in that term MUST be included, even if insignificant.
Model Selection Focus
If the focus is on prediction, don’t over worry about model complexity. If the focus is on a particular predictor, you need to make sure it’s interpretable in your final model.
Simple Linear Regression Conditions
Linear regressions must meet these conditions: Linearity, Approximately normal residuals, Constant variability, and Independence
Multiple Regression Diagnostics
Multiple Regression Diagnostics include Residuals with Fitted values, Residuals with each predictor, Residuals in order of observation (if available), Normal QQ-plot of residuals
Residuals vs. Fitted Values
When using Residuals vs. Fitted Values, you are looking for the same thing with the Residuals vs. Fitted plot, such as random scatter of points, no clear pattern (either linear or nonlinear), no clear outliers, and constant variation
Residuals with Numerical Predictor
Instead of just looking at one residual plot, we look at it for each numerical predictor.
Residuals with Categorical Predictor
There should be no relationship between categories and residuals, no clear outliers, and no big differences in variability
Independence Assumption
Checking whether the independence assumption was met by ensuring the residuals have no pattern with the order in which the data was collected.
Normality of Residuals
The normality of residuals can be checked using the QQ-plot and histogram.