STA 363 Final Study Guide

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/30

Earn XP

Description and Tags

Flashcards for STA 363 Final Exam Review

Statistics

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

31 Terms

New cards

When are Reduced F-tests used?

Used to compare a full model to a reduced model with fewer predictors.

New cards

What variables are being tested in a Reduced F-test?

The variables that are left out of the reduced model.

New cards

What should you look for in the output of a Reduced F-test?

F-statistic, degrees of freedom, and p-value.

New cards

How are binary predictors coded using dummy variables?

One dummy variable coded 0/1.

New cards

How are predictors with 3+ factor levels coded using dummy variables?

Choose a reference category and set up k-1 dummy variables, where k is the number of factor levels.

New cards

How should linear model coefficients for categorical predictors be interpreted?

Average difference in the response variable between groups.

New cards

In ANCOVA, what is tested, and what decision is based on the p-value?

Test for significant interactions. If p-value is not less than 0.05, leave interaction out of the model.

New cards

In ANCOVA, define main effects and when you can interpret them.

Main effects are the coefficients for the non-interaction terms, which cannot be interpreted if there is a significant interaction.

New cards

Write fitted models for both levels of a binary predictor X with a numerical predictor Z and interactions.

X=0: beta0 + beta1Z; X=1: (beta0 + beta2) + (beta1 + beta3)Z

New cards

In Box-Cox transformation, what do the peak and zero represent, and when is no transformation necessary?

Peak represents optimal power transformation; zero corresponds to log transformation; If one is in the range of acceptable values, no transformation necessary.

New cards

What plot is used to identify unusual observations?

Residuals vs. leverage plot.

New cards

How are outliers and high-leverage points identified?

Outliers: residuals larger than +/- 3; High-leverage: look for natural gaps in leverage values.

New cards

What should be done about unusual observations?

Verify that they are legitimate data entries. If so, should not remove them.

New cards

What indicates a multicollinearity issue using VIFs?

VIFs (>10 indicates a multicollinearity issue).

New cards

How can multicollinearity be addressed?

Remove or scale predictors.

New cards

What does scaling predictors mean?

Scaling predictors means we standardize them by centering and scaling – every predictor is represented by Z-scores instead.

New cards

How do you decide the best model based on AIC, BIC, and R2?

AIC: lower is better; BIC: lower is better; R2: higher is better.

New cards

What are the starting models for forward and backward stepwise selection?

Forward: start with empty model. MUST ALSO SPECIFY SCOPE. Backward: start with full model.

New cards

What does the model output from stepwise selection show?

Shows each iteration with AIC values as well as which variables were removed or added at each step.

New cards

How do you interpret the model output from the chosen model in stepwise selection?

Review linear model output: F-test, T-tests, coefficients.

New cards

How is best subsets selection different from stepwise selection?

Checks every combination of predictors. Step-wise selection only checks some of the models.

New cards

What is the main limitation for best subsets selection?

Computation is slow.

New cards

What are the benefits of model validation compared to other model selection criteria?

Eliminates the bias that comes from using the same data for both fitting and for evaluation.

New cards

What are the main concepts behind model validation?

Divide data into 2 parts: Training data (fit model), Test data (evaluate model).

New cards

What does the number of folds control in cross-validation?

Controls how many groups we create from the data for testing sets.

New cards

How do you choose models based on cross-validation output?

Check RMSE values.

New cards

What is the model form for Logistic Regression?

logit(p) = beta0 + beta1*X + …

New cards

What is the relationship between p and odds?

Odds = p/(1-p) = P(Success)/P(Failure).

New cards

How do you interpret model coefficients (intercept and other coefficients) in Logistic Regression?

Exponentiated intercept is the odds of success when all predictors are equal to 0. Other coefficients (exponentiated) are odds ratios.

New cards

How can deviance be used to describe variability in Logistic Regression?

If the model is a good fit, null deviance should be large compared to residual deviance.

New cards

List the different models discussed over the semester.

ANOVA (One-way, Two-way, Blocked, Repeated Measures), Linear Regression (Simple and Multiple), Generalized Linear Models (Logistic Regression)