lecture 3: Hypothesis Tests (Proportions)

Framework to evaluate hypotheses using data.
Null Hypothesis ( $H_0$ ): A simple default, assumed true until contradicted. Cannot be accepted or proven.
Alternative Hypothesis ( $H1$ or $HA$ ): States a difference or effect exists.
Logic: Assume $H0$ is true. If data significantly contradicts $H0$ , reject $H0$ . Otherwise, fail to reject $H0$ .
Four Parts: Null (and Alternative) Hypothesis, Test Statistic, P-Value, Conclusion.
Test Statistic ( $Z$ ): Measures how many standard errors the sample proportion is from the hypothesized population proportion.
P-Value: Probability of observing data as extreme or more extreme than the sample data, assuming $H0$ is true. A smaller P-value provides stronger evidence against $H0$ .
Conclusion: Reject $H0$ if $|Z| ext{ extgreater } Z^*$ (e.g., $1.96$ for $95\%$ confidence) OR P-value $ext{ extless }= ext{ }\alpha$ (significance level, e.g., $0.05$ ). Fail to reject $H0$ otherwise.
Statistical Significance: When $H_0$ is rejected, the results are statistically significant (likely a real difference, not just sampling noise).
Inconclusive: When $H_0$ is not rejected, the difference is indistinguishable from sampling noise.
Direction of Difference: If $H0$ is rejected, the direction (e.g., greater or smaller) is indicated by the sample measurement's sign relative to the hypothesized value (e.g., $ext{\hat{p} - p0}$ or $ext{\hat{p}1 - \hat{p}2}$ ).

Purpose: Test if a population proportion ( $p$ ) is equal to a hypothesized value ( $p_0$ ).
Null Hypothesis ( $H0$ ): $p = p0$
Alternative Hypothesis ( $HA$ ): $p \neq p0$
Test Statistic (Z-score): $Z = \frac{\hat{p} - p0}{\sqrt{\frac{p0(1-p_0)}{n}}}$ where $\hat{p}$ is the sample proportion and $n$ is the sample size.
P-Value Calculation: For a two-tailed test, $2 \times ext{NORM.S.DIST(-ABS(Z), TRUE)}$ .

Purpose: Test if two population proportions ( $p1$ , $p2$ ) are equal.
Null Hypothesis ( $H0$ ): $p1 = p2$ (or $p1 - p_2 = 0$ )
Alternative Hypothesis ( $HA$ ): $p1 \neq p2$ (or $p1 - p_2 \neq 0$ )
Test Statistic (Z-score): $Z = \frac{(\hat{p1} - \hat{p2}) - 0}{\sqrt{\hat{p}{pooled}(1-\hat{p}{pooled})(\frac{1}{n1} + \frac{1}{n2})}}$ where $\hat{p}{pooled} = \frac{x1 + x2}{n1 + n_2}$ is the pooled sample proportion.

Key Assumptions: Randomization, independence (between items AND samples), appropriate sample size.
Real-World Challenges: These assumptions are often violated.
- Sampling Bias: Non-random or unrepresentative samples lead to inaccurate conclusions (e.g., Big Data Paradox).
- Self-Reporting Bias: Subjects may provide dishonest or inaccurate information.
- Interference: Subjects can influence each other, violating independence.
Mitigation: Design studies carefully to limit bias; be critical of results from poorly designed experiments.