Linear Regression II

Course: Biostatistics 521: Applied Biostatistics
Instructor: Mousumi Banerjee
Topics Covered:
- Various Regression Topics
- Example: Poverty vs. Murder Rate
- Confidence Intervals for Regression
- Extrapolation
- Centering Numerical Covariates
- Transformations
- Outliers Analysis
- Linear Regression with Categorical Covariates

Objective: Predict annual murders per million from the percentage of individuals living in poverty in a random sample of 20 metropolitan areas.
Steps for Performing Simple Linear Regression:
1. Assess whether the data points are independent.
2. Create a scatterplot to evaluate if the relationship appears approximately linear.
3. Fit the regression line to the data.
4. Create residual plots and QQ plots.
5. Confirm the linearity of the relationship.
6. Check for constant variance of residuals.
7. Check for normality of residuals.
8. Perform inference on the fitted line.
9. Compute predicted values for annual murders based on poverty levels.

Assessing Assumptions:
- No clear patterns in the residuals indicate good model fitting.
- There may be a slight departure from normality, validated by comparison of observed residuals to theoretical quantiles.

M=−29.9+2.559×PM=−29.9+2.559×P
Hypothesis Testing:
- Null Hypothesis (H0H0): \betaP = 0 (no relationship between poverty and murder rate)
- Alternative Hypothesis (H1H1): \betaP \neq 0
Test Statistic Result:
- p=3.64×10−4p=3.64×10−4
- Conclusion: Reject the null hypothesis, indicating a significant relationship.

For each 1-unit increase in the percentage of individuals living in poverty, the expected annual number of murders per million increases by 2.559.
This relationship can be scaled up: a 10% increase in poverty results in an expected increase of approximately 25 additional murders per million.

Point Estimates:
- β0 and β1β0 and β1 are the best guesses at the true population parameters.
Confidence Interval Formula:
- The (1−α)(1−α) % confidence interval for the population slope parameter β1β1 is: β^1±tα2,n−2×S(β^1)β^1±t2α,n−2×S(β^1)
- Where nn is the sample size.
When nn is large, tt can be replaced with zz.

95% Confidence Interval for the effect of % in poverty on the annual murder rate:
2.559±(1.96×0.390)=(1.79,3.32)2.559±(1.96×0.390)=(1.79,3.32)
Conclusion: We are 95% confident that the true effect size is between 1.79 and 3.32.

Question: What is the predicted annual murder rate for a city where 22% of the population lives in poverty?
Calculation:
- M=−29.9+2.559×22M=−29.9+2.559×22
- M=26.398M=26.398
- Conclusion: The predicted murder rate for this city is approximately 26.4 per million.

Formula for Confidence Intervals at a Fixed Value

Confidence Interval Construction:
- Predicted Value±tn−2×srPredicted Value±tn−2×sr
- Where srsr is derived from residuals and provides a measure of precision around the predicted value.

Example Calculation

For a prediction of a car's fuel efficiency (mpg) when weight is fixed at 3.1K lbs:
20.71±1.96×3.04620.71±1.96×3.046
- Result: 95% Confidence Interval for predicted mean mpg is (19.65, 21.77).

Definition: Extrapolating refers to predicting outcomes for exposure levels outside the range of data that the regression model was fit on.
Caution: Extrapolation is discouraged as it can result in poor predictions due to potential nonlinear relationships outside the observed range.

Case Study: A one-year-old's shoulder girth of 56 cm is outside the model's data range (85 to 135 cm).
Questioning the appropriateness of using the model for predictions:
- We lack data; thus, the inferred linear relationship is speculative and could lead to inaccurate outcomes.

To ensure accurate predictions, maintain analysis within established data ranges whenever possible, and always assess underlying assumptions in regression modeling.