1/30
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
When are Reduced F-tests used?
Used to compare a full model to a reduced model with fewer predictors.
What variables are being tested in a Reduced F-test?
The variables that are left out of the reduced model.
What should you look for in the output of a Reduced F-test?
F-statistic, degrees of freedom, and p-value.
How are binary predictors coded using dummy variables?
One dummy variable coded 0/1.
How are predictors with 3+ factor levels coded using dummy variables?
Choose a reference category and set up k-1 dummy variables, where k is the number of factor levels.
How should linear model coefficients for categorical predictors be interpreted?
Average difference in the response variable between groups.
In ANCOVA, what is tested, and what decision is based on the p-value?
Test for significant interactions. If p-value is not less than 0.05, leave interaction out of the model.
In ANCOVA, define main effects and when you can interpret them.
Main effects are the coefficients for the non-interaction terms, which cannot be interpreted if there is a significant interaction.
Write fitted models for both levels of a binary predictor X with a numerical predictor Z and interactions.
X=0: beta0 + beta1Z; X=1: (beta0 + beta2) + (beta1 + beta3)Z
In Box-Cox transformation, what do the peak and zero represent, and when is no transformation necessary?
Peak represents optimal power transformation; zero corresponds to log transformation; If one is in the range of acceptable values, no transformation necessary.
What plot is used to identify unusual observations?
Residuals vs. leverage plot.
How are outliers and high-leverage points identified?
Outliers: residuals larger than +/- 3; High-leverage: look for natural gaps in leverage values.
What should be done about unusual observations?
Verify that they are legitimate data entries. If so, should not remove them.
What indicates a multicollinearity issue using VIFs?
VIFs (>10 indicates a multicollinearity issue).
How can multicollinearity be addressed?
Remove or scale predictors.
What does scaling predictors mean?
Scaling predictors means we standardize them by centering and scaling – every predictor is represented by Z-scores instead.
How do you decide the best model based on AIC, BIC, and R2?
AIC: lower is better; BIC: lower is better; R2: higher is better.
What are the starting models for forward and backward stepwise selection?
Forward: start with empty model. MUST ALSO SPECIFY SCOPE. Backward: start with full model.
What does the model output from stepwise selection show?
Shows each iteration with AIC values as well as which variables were removed or added at each step.
How do you interpret the model output from the chosen model in stepwise selection?
Review linear model output: F-test, T-tests, coefficients.
How is best subsets selection different from stepwise selection?
Checks every combination of predictors. Step-wise selection only checks some of the models.
What is the main limitation for best subsets selection?
Computation is slow.
What are the benefits of model validation compared to other model selection criteria?
Eliminates the bias that comes from using the same data for both fitting and for evaluation.
What are the main concepts behind model validation?
Divide data into 2 parts: Training data (fit model), Test data (evaluate model).
What does the number of folds control in cross-validation?
Controls how many groups we create from the data for testing sets.
How do you choose models based on cross-validation output?
Check RMSE values.
What is the model form for Logistic Regression?
logit(p) = beta0 + beta1*X + …
What is the relationship between p and odds?
Odds = p/(1-p) = P(Success)/P(Failure).
How do you interpret model coefficients (intercept and other coefficients) in Logistic Regression?
Exponentiated intercept is the odds of success when all predictors are equal to 0. Other coefficients (exponentiated) are odds ratios.
How can deviance be used to describe variability in Logistic Regression?
If the model is a good fit, null deviance should be large compared to residual deviance.
List the different models discussed over the semester.
ANOVA (One-way, Two-way, Blocked, Repeated Measures), Linear Regression (Simple and Multiple), Generalized Linear Models (Logistic Regression)