Inferences for Regression Notes

Chapter 27: Inferences for Regression

Correlation & Regression Review
  • Group work for 10-15 minutes to review correlation and regression concepts.

Example: Body Fat and Waist Size
  • Examines relationship between % body fat and waist size (inches).

  • Scatterplot represents this data.

Understanding Regression
  • Regression models the relationship between two quantitative variables:

    • Predictor variable (independent)

    • Response variable (dependent)

  • Imagines an idealized regression line where means fall along this line.

Inference in Regression
  • Aim to go beyond individual data points by creating confidence intervals and testing hypotheses regarding the slope of the regression line.

Population vs. Sample
  • In finding confidence intervals for means, we assumed a true underlying mean.

  • For regression, a corresponding collection of % body fat for given waist sizes exists; must be aware distributions change.

Idealized Regression Model
  • Parameters are described using Greek letters:

    • Intercept: ( \beta_0 )

    • Slope: ( \beta_1 )

  • Represents the fitted line as:

    • ( \muy = \beta0 + \beta_1 x)

  • Errors must be included: ( y = \beta0 + \beta1 x + \epsilon )

Assumptions & Conditions
  • In regression inference, several assumptions must be checked:

    1. Linearity Assumption:

    • Straight Enough Condition: The scatterplot must show a linear relationship.

    1. Independence Assumption:

    • Individuals in the sample must be representative with independent observations.

    1. Equal Variance Assumption:

    • Spread of residuals should remain uniform (Checked via residual plot).

    1. Normal Population Assumption:

    • Residuals should form a nearly normal distribution – unimodal and symmetric.

Condition Check Order
  • Scatterplot: Check for linearity

  • Fit Regression Model: Calculate residuals ( \epsilon )

  • Residuals vs. Plot: Check scatter for patterns

  • Histogram and Normal Probability Plot: To confirm normality of residuals

  • Proceed with inference if assumptions hold.

Intuition of Regression Inference
  • Sample impacts variability of estimated regression slope and intercept.

  • Three factors that influence standard error of slope (( SE(b) )):

    • Scatter around regression line (measured by residual standard deviation ( S_e ))

    • Spread of x values (standard deviation ( S_X ))

    • Sample size ( n )

  • Standard error formula:

    • ( SE(b) = \frac{S_e}{\sqrt{n-1}} )

Sampling Distribution
  • When conditions are satisfied, the estimated regression slope follows a Student's t-distribution:

    • ( \frac{b1 - \beta1}{SE(b_1)} \sim t(n - 2) )

Regression Inference
  • Null Hypothesis: ( H0: \beta1 = 0 ) (No linear relationship)

  • Test Statistic:

    • ( t = \frac{b1 - 0}{SE(b1)} )

  • Confidence Interval Formula:

    • ( b1 \pm t{(df)} * SE(b_1) )

Example Problem: Math and Anxiety Test Data
  • Null Hypotheses: ( H0: \beta1 = 0 )

  • Alternative Hypothesis: ( HA: \beta1 \neq 0 )

  • Conditions to Verify:

    • Straight Enough Condition: Check scatterplot.

    • Independence: No influence from one student to another.

    • Constancy of Variance: Residuals plot consistent.

    • Normality: Histogram of residuals is unimodal and symmetric.

  • Conclusion: Low P-value (0.0084) indicates strong evidence against null hypothesis.

  • Confidence Interval Example:

    • ( b_1 = -4.486 \pm 2.074(1.551) = (-7.70, -1.27) )

    • Meaning: 95% confidence that an increase in anxiety correlates with a decline in math scores of 1.27 to 7.70 points.

Conclusion
  • Understanding of regression concepts and rigorous checking of assumptions is crucial to valid statistical inference.

  • The process allows for practical interpretations relating data attributes and trends through regression analysis.