Lecture_20Video_20W8D2_20-_20Regression_20diagnostics_20-_20part_203

Announcements

  • No Class on Friday: Due to a conference in the staff department. No practice problem session will take place.

    • A possible small video will be added along with the Wednesday video for problem solutions.

  • Upcoming Quiz: At least one quiz will be next week for the face-to-face section, with the online section quiz being available for a week starting Friday.

  • Midterm Grading: Midterm grades will be processed within the next week, with a solution key already posted.

Regression Diagnostics Overview

  • Focus of today's class: regression diagnostics using R.

  • Previously discussed types of plots to check assumptions:

    • Predictor versus residual plots.

    • Fitted value versus residual plots.

    • Normal QQ plot.

R Coding for Diagnostics

  • Library Needed: library(MASS) for the studress function.

  • Dataset: 'Grocery data' with predictors:

    • Total labor hours (response variable).

    • Number of boxes packed, indirect cost percentage, holiday status.

Fitted Value vs Residual Plot

  • Created after fitting the model, showed reverse funnel shape indicating possible heteroscedasticity.

  • Reminder: Avoid conclusions based on a small number of data points (only 6 for holidays).

Example with PSA Data

  • Predictors:

    • Volume of prostate, weight of prostate, age, BPH, CP.

  • Response variable: Prostate-specific antigen (PSA).

  • Some issues identified in the fitted value vs residual plot:

    • Heteroscedasticity apparent; variability increases as values rise.

    • Outliers present.

Transformations for Issues

  • If nonlinearity is evident, transform the specific predictor variable.

  • Pairs Plot: Used to visualize relationships between predictors and response for nonlinearity detection.

  • Log Transformation: Recommended for addressing nonnormality and heteroscedasticity issues.

  • Examples of transformations discussed include logarithmic, square root, and polynomial methods.

    • Log transformation applied to improve residual plots and QQ plots, revealing better fit.

Summary of Model Fitting Process

  • New Fit: Created with the transformed response variable (log PSA).

  • Revised model evaluations indicate better compliance with assumptions.

Interpretation of Results

  • Careful interpretation required post-transformation:

    • Interpret relationships based on transformed variables only.

    • Avoid reverse transformation to interpret coefficients; relationships should be discussed in terms of the transformed scale.

    • Example relationships analyzed include:

      • Significant relationship between age and log PCB with consideration of the statistical meaning.

Final Considerations

  • Keep practicing interpretations to strengthen understanding.

  • Explore topics discussed further in the Wednesday video, including Q-Q plots and upcoming practice problems.