Study Notes on Hypothesis Testing and Linear Regression
12.1 Hypothesis Testing: Linear Regression Edition
Learning Targets
I can construct a confidence interval, check the conditions for, and run a hypothesis test for the slope (𝛽) of the true regression line.
Hypothesis Testing Notes
This section may appear in multiple-choice questions (MC) but has rarely been seen in free-response questions (FRQ).
Notation and Provided Information
Sampling distributions for simple linear regression:
Random Variable for Slope:
Notation: $b$
Parameters of Sampling Distribution:
Mean: ext{mean}_ ext{error} = eta
Variance:
Standard Error of Sample Statistic: S = rac{ ext{Σ}(x - ar x)^2}{n - 1}
Where $ar x$ refers to the mean of $x$ values and $n$ is the sample size.
Population vs. Sample
Definitions:
Population Regression Line:
A regression line calculated from every value in the population. It is known as the true regression line.
Equation: ar y = eta0 + eta1 x where:
$ar y$ = mean $y$ for a given $x$
$eta_0$ = population y-intercept
$eta_1$ = population slope
Sample Regression Line:
A regression line calculated from a sample. It is known as the estimated regression line.
Equation: where:
$ ilde y$ = estimated mean $y$ for a given $x$
$b_0$ = sample y-intercept
$b_1$ = sample slope
Old Faithful Example
Observational Data:
Data Points: Duration and interval between eruptions of Old Faithful (263 eruptions in a month).
Population Least-Squares Line: Shown in blue on the scatterplot representing all eruption data.
Slope Sampling Distribution
Sampling Distribution of the Slope:
Select a Simple Random Sample (SRS) of $n$ observations $(x, y)$ from a population.
Least-squares regression model: ar y = eta0 + eta1 x
Mean of the sampling distribution of $b1$: ext{mean}(b1) = eta_1
Standard Deviation of Sampling Distribution:
Exists if the 10% condition is satisfied: n < 0.10N
Interpretation: The slopes of sample regression lines differ from the population regression line by the standard deviation, typically around 1.42.
Experiments and Observational Studies
Study Inquiry: Does seat location impact student scores?
30 students randomly assigned to seats and their exam scores recorded.
Questions regarding random assignment significance.
Regression Analysis and Interpretations
Slope of the Least-Squares Regression Line (LSRL) Interpretation:
Negative slope indicates potential causation link.
Conditions checked: Linear relationship, Independence, Normality, Equal Standard Deviation (LINER condition).
Formula:
A 95% confidence interval for the unknown slope B₁ is formulated as:
Significance Tests for the Slope
Null Hypothesis:
Formulation of the t-test statistic:
P-value calculated for achieving or exceeding test statistic.
Various Examples for Contextual Application
GPA and ACT Score Analysis
Small sample investigates the relationship between GPA and ACT scores through regression analysis.
Balloon Box Volume Study
This study examines the correlation between the volume of balloons and various factors such as inflation method and material composition using linear regression techniques.
Investigating how height of a balloon box correlates with the number of balloons required for filling, utilizing power models for predictive analysis.
Exponential Progression: Moore's Law
Modeling growth in the number of transistors, hinting at exponential relationships through transformed data.
Conclusion/Reflections on Statistical Practices
Importance of clearly stating hypothesis, defining parameters, recognizing patterns in data through visual representations such as residual plots, clarifying interpretations, and continuously aligning statistical findings to real-world implications.