Lecture_20Video_20W8D2_20-_20Regression_20diagnostics_20-_20part_203
Announcements
No Class on Friday: Due to a conference in the staff department. No practice problem session will take place.
A possible small video will be added along with the Wednesday video for problem solutions.
Upcoming Quiz: At least one quiz will be next week for the face-to-face section, with the online section quiz being available for a week starting Friday.
Midterm Grading: Midterm grades will be processed within the next week, with a solution key already posted.
Regression Diagnostics Overview
Focus of today's class: regression diagnostics using R.
Previously discussed types of plots to check assumptions:
Predictor versus residual plots.
Fitted value versus residual plots.
Normal QQ plot.
R Coding for Diagnostics
Library Needed:
library(MASS)for thestudressfunction.Dataset: 'Grocery data' with predictors:
Total labor hours (response variable).
Number of boxes packed, indirect cost percentage, holiday status.
Fitted Value vs Residual Plot
Created after fitting the model, showed reverse funnel shape indicating possible heteroscedasticity.
Reminder: Avoid conclusions based on a small number of data points (only 6 for holidays).
Example with PSA Data
Predictors:
Volume of prostate, weight of prostate, age, BPH, CP.
Response variable: Prostate-specific antigen (PSA).
Some issues identified in the fitted value vs residual plot:
Heteroscedasticity apparent; variability increases as values rise.
Outliers present.
Transformations for Issues
If nonlinearity is evident, transform the specific predictor variable.
Pairs Plot: Used to visualize relationships between predictors and response for nonlinearity detection.
Log Transformation: Recommended for addressing nonnormality and heteroscedasticity issues.
Examples of transformations discussed include logarithmic, square root, and polynomial methods.
Log transformation applied to improve residual plots and QQ plots, revealing better fit.
Summary of Model Fitting Process
New Fit: Created with the transformed response variable (log PSA).
Revised model evaluations indicate better compliance with assumptions.
Interpretation of Results
Careful interpretation required post-transformation:
Interpret relationships based on transformed variables only.
Avoid reverse transformation to interpret coefficients; relationships should be discussed in terms of the transformed scale.
Example relationships analyzed include:
Significant relationship between age and log PCB with consideration of the statistical meaning.
Final Considerations
Keep practicing interpretations to strengthen understanding.
Explore topics discussed further in the Wednesday video, including Q-Q plots and upcoming practice problems.