Study Notes on Hypothesis Testing and Linear Regression

12.1 Hypothesis Testing: Linear Regression Edition

Learning Targets

  • I can construct a confidence interval, check the conditions for, and run a hypothesis test for the slope (𝛽) of the true regression line.

Hypothesis Testing Notes

  • This section may appear in multiple-choice questions (MC) but has rarely been seen in free-response questions (FRQ).

Notation and Provided Information

  • Sampling distributions for simple linear regression:

    • Random Variable for Slope:

    • Notation: $b$

    • Parameters of Sampling Distribution:

    • Mean: ext{mean}_ ext{error} = eta

    • Variance: extvar(b)ext{var}(b)

    • Standard Error of Sample Statistic: S = rac{ ext{Σ}(x - ar x)^2}{n - 1}

      • Where $ar x$ refers to the mean of $x$ values and $n$ is the sample size.

Population vs. Sample

  • Definitions:

    • Population Regression Line:

    • A regression line calculated from every value in the population. It is known as the true regression line.

    • Equation: ar y = eta0 + eta1 x where:

      • $ar y$ = mean $y$ for a given $x$

      • $eta_0$ = population y-intercept

      • $eta_1$ = population slope

    • Sample Regression Line:

    • A regression line calculated from a sample. It is known as the estimated regression line.

    • Equation: ildey=b<em>0+b</em>1xilde y = b<em>0 + b</em>1 x where:

      • $ ilde y$ = estimated mean $y$ for a given $x$

      • $b_0$ = sample y-intercept

      • $b_1$ = sample slope

Old Faithful Example

  • Observational Data:

    • Data Points: Duration and interval between eruptions of Old Faithful (263 eruptions in a month).

    • Population Least-Squares Line: Shown in blue on the scatterplot representing all eruption data.

Slope Sampling Distribution

  • Sampling Distribution of the Slope:

    • Select a Simple Random Sample (SRS) of $n$ observations $(x, y)$ from a population.

    • Least-squares regression model: ar y = eta0 + eta1 x

    • Mean of the sampling distribution of $b1$: ext{mean}(b1) = eta_1

    • Standard Deviation of Sampling Distribution:

    • Exists if the 10% condition is satisfied: n < 0.10N

    • Interpretation: The slopes of sample regression lines differ from the population regression line by the standard deviation, typically around 1.42.

Experiments and Observational Studies

  • Study Inquiry: Does seat location impact student scores?

    • 30 students randomly assigned to seats and their exam scores recorded.

    • Questions regarding random assignment significance.

Regression Analysis and Interpretations

  • Slope of the Least-Squares Regression Line (LSRL) Interpretation:

    • Negative slope indicates potential causation link.

    • Conditions checked: Linear relationship, Independence, Normality, Equal Standard Deviation (LINER condition).

  • Formula:

    • extNonConfidenceRanges:extstatisticext±(extcriticalvalue)imes(extstandarddeviation)ext{Non-Confidence Ranges:} ext{statistic } ext{±} ( ext{critical value}) imes ( ext{standard deviation})

  • A 95% confidence interval for the unknown slope B₁ is formulated as:

    • b<em>1ext±timesSE</em>b1b<em>1 ext{± } t^* imes SE</em>{b_1}

Significance Tests for the Slope

  • Null Hypothesis: H0:B=exthypothesizedslopeH_0: B = ext{hypothesized slope}

  • Formulation of the t-test statistic:

    • t=racb<em>1exthypothesizedslopeSE</em>bt = rac{b<em>1 - ext{hypothesized slope}}{SE</em>b}

    • P-value calculated for achieving or exceeding test statistic.

Various Examples for Contextual Application

GPA and ACT Score Analysis
  • Small sample investigates the relationship between GPA and ACT scores through regression analysis.

Balloon Box Volume Study
  • This study examines the correlation between the volume of balloons and various factors such as inflation method and material composition using linear regression techniques.

  • Investigating how height of a balloon box correlates with the number of balloons required for filling, utilizing power models for predictive analysis.

Exponential Progression: Moore's Law
  • Modeling growth in the number of transistors, hinting at exponential relationships through transformed data.

Conclusion/Reflections on Statistical Practices

  • Importance of clearly stating hypothesis, defining parameters, recognizing patterns in data through visual representations such as residual plots, clarifying interpretations, and continuously aligning statistical findings to real-world implications.