Inferences for Regression Notes
Chapter 27: Inferences for Regression
Correlation & Regression Review
Group work for 10-15 minutes to review correlation and regression concepts.
Example: Body Fat and Waist Size
Examines relationship between % body fat and waist size (inches).
Scatterplot represents this data.
Understanding Regression
Regression models the relationship between two quantitative variables:
Predictor variable (independent)
Response variable (dependent)
Imagines an idealized regression line where means fall along this line.
Inference in Regression
Aim to go beyond individual data points by creating confidence intervals and testing hypotheses regarding the slope of the regression line.
Population vs. Sample
In finding confidence intervals for means, we assumed a true underlying mean.
For regression, a corresponding collection of % body fat for given waist sizes exists; must be aware distributions change.
Idealized Regression Model
Parameters are described using Greek letters:
Intercept: ( \beta_0 )
Slope: ( \beta_1 )
Represents the fitted line as:
( \muy = \beta0 + \beta_1 x)
Errors must be included: ( y = \beta0 + \beta1 x + \epsilon )
Assumptions & Conditions
In regression inference, several assumptions must be checked:
Linearity Assumption:
Straight Enough Condition: The scatterplot must show a linear relationship.
Independence Assumption:
Individuals in the sample must be representative with independent observations.
Equal Variance Assumption:
Spread of residuals should remain uniform (Checked via residual plot).
Normal Population Assumption:
Residuals should form a nearly normal distribution – unimodal and symmetric.
Condition Check Order
Scatterplot: Check for linearity
Fit Regression Model: Calculate residuals ( \epsilon )
Residuals vs. Plot: Check scatter for patterns
Histogram and Normal Probability Plot: To confirm normality of residuals
Proceed with inference if assumptions hold.
Intuition of Regression Inference
Sample impacts variability of estimated regression slope and intercept.
Three factors that influence standard error of slope (( SE(b) )):
Scatter around regression line (measured by residual standard deviation ( S_e ))
Spread of x values (standard deviation ( S_X ))
Sample size ( n )
Standard error formula:
( SE(b) = \frac{S_e}{\sqrt{n-1}} )
Sampling Distribution
When conditions are satisfied, the estimated regression slope follows a Student's t-distribution:
( \frac{b1 - \beta1}{SE(b_1)} \sim t(n - 2) )
Regression Inference
Null Hypothesis: ( H0: \beta1 = 0 ) (No linear relationship)
Test Statistic:
( t = \frac{b1 - 0}{SE(b1)} )
Confidence Interval Formula:
( b1 \pm t{(df)} * SE(b_1) )
Example Problem: Math and Anxiety Test Data
Null Hypotheses: ( H0: \beta1 = 0 )
Alternative Hypothesis: ( HA: \beta1 \neq 0 )
Conditions to Verify:
Straight Enough Condition: Check scatterplot.
Independence: No influence from one student to another.
Constancy of Variance: Residuals plot consistent.
Normality: Histogram of residuals is unimodal and symmetric.
Conclusion: Low P-value (0.0084) indicates strong evidence against null hypothesis.
Confidence Interval Example:
( b_1 = -4.486 \pm 2.074(1.551) = (-7.70, -1.27) )
Meaning: 95% confidence that an increase in anxiety correlates with a decline in math scores of 1.27 to 7.70 points.
Conclusion
Understanding of regression concepts and rigorous checking of assumptions is crucial to valid statistical inference.
The process allows for practical interpretations relating data attributes and trends through regression analysis.