Chapter 4 Notes: Significance Tests — Five-Part Framework, Means, Proportions, Errors, Limitations, Binomial, Practice

4.0 Introduction

The goal of many studies: check whether data agree with predicted values (hypotheses about a population).
Predictions or guess values drive the theory; significance tests measure the strength of evidence sample data provide for or against a specific hypothesis.
They compare point estimates of parameters to predicted values from the hypothesis.
Five-step significance tests can be applied to a mean and to a proportion; limitations and remarks follow.

4.1 The Five Parts of a Significance Test

Hypothesis and significance test
- A hypothesis is a statement about a population parameter (mean, proportion, etc.).
- Null hypothesis H0: parameter takes a particular value.
- Alternative hypothesis Ha (research hypothesis): the parameter falls in some alternative range.
- Usually, H0 corresponds to "no effect"; Ha represents an effect or difference.
- Examples to illustrate questions:
- Ex.1: A supermarket training program and whether females are disproportionately selected compared with males for training.
- Ex.2: Bob

's claim that the USA proportion of adults who favor legalized drugs equals 0.50.

The five parts of a significance test:
1) ASSUMPTIONS
- Type of data: quantitative vs. categorical.
- Randomization.
- Population distribution (normality for means is assumed; large-sample normality for proportions).
- Sample size: larger samples improve validity for many tests.
  2) HYPOTHESIS
- Null hypothesis H0 and alternative Ha.
  3) TEST STATISTIC
- Summarizes how far the point estimate falls from the H0 value (number of standard errors between the estimate and the H0 value).
  4) P-VALUE
- The probability, under H0, that the test statistic would be as extreme as or more extreme than the observed value, in the direction(s) specified by Ha.
- Smaller P-value

→ stronger evidence against H0.

5) CONCLUSION

Interpret the P-value.
Decision rule: reject H0 if P

$\le \alpha$ (pre-specified level, e.g., $\alpha = 0.05$ or $\alpha = 0.01$ ); do not reject H0 if P > $\alpha$ .

Remarks
- The significance level $\alpha$ is the probability of rejecting H0 when H0 is true; choose $\alpha$ before analyzing data.
- Hypotheses always refer to population parameters, not sample statistics (samples have uncertainty, not the population).

4.2 Significance Test for a Mean

Significance test for a mean (five parts):
1) ASSUMPTIONS
- Type of data: quantitative data.
- Randomization.
- Normal population distribution.
- Sample-size: particularly relevant for small samples.
 2) HYPOTHESIS
- H0: $\mu = \mu_0$
- Ha: two-sided $\mu \ne \mu_0$ (two-sided) or one-sided in extensions.
 3) TEST STATISTIC
- If H0 is true, the center of the sampling distribution of $\bar{y}$ is $\mu_0$ .
- Test statistic: $t = \frac{\bar{y} - \mu_0}{s/\sqrt{n}}$ - Here, $se = \frac{s}{\sqrt{n}}$ (note: the source text contains a typo showing $se = s \sqrt{n}$ ; the correct standard error is $s/\sqrt{n}$ ).
- The sampling distribution of $\bar{y}$ under H0 is a t-distribution with degrees of freedom $df = n - 1$ .
 4) P-VALUE
- Under H0, compute the P-value as the two-tailed probability that the t-statistic is at least as large in absolute value as the observed value: $P = 2\cdot P\left(T{df} \ge |t{obs}|\right)$ .
 5) CONCLUSION
- Smaller P-value

→ stronger evidence against H0 and in favor of Ha.

Alternative hypothesis forms
- Two-sided: $Ha: \mu \ne \mu_0$ (P-value is the two-tail probability).
- One-sided: Ha: \mu > \mu0 (P-value is the right-tail probability) or Ha: \mu < \mu0 (left-tail probability).
- Two-sided tests are more common; context may justify one-sided tests (e.g., mean has changed vs. mean has decreased).
Remark: If the population standard deviation $\sigma$ is known and $n \ge 30$ , the Central Limit Theorem allows using the z-score instead of t-score: $z = \frac{\bar{y} - \mu_0}{\sigma/\sqrt{n}}$ .
Example Ex.3 (7-point scale data)
- Data: seven-point scale with counts: Extremes liberal (10), Liberal (21), Slightly liberal (22), Moderate (74), Slightly conservative (21), Conservative (27), Extremely conservative (11); total $n = 186$ .
- Goal: test whether the population mean is moderate.
- Assumptions: quantitative (seven-point scale); randomization; normal population distribution; large sample.
- Hypotheses: $H0: \mu = 4$ (moderate on a 7-point scale) vs. $Ha: \mu \ne 4$ .
- Test procedures: compute sample mean, standard deviation, and perform t-test with appropriate $se = \frac{s}{\sqrt{n}}$ ; evaluate P-value and conclusion (specific numeric results not provided in the transcript).
Figures and interpretation notes
- Figures illustrate two-sided vs. one-sided alternatives (conceptual).
- If $\sigma$ is known and n large, z-test is an alternative to t-test.

4.3 Significance Test for a Proportion

Five parts (for a proportion):
1) ASSUMPTIONS
- Type of data: categorical data.
- Randomization.
- Normal sampling distribution (e.g., if $H0: \pi = \pi0$ , a sample size of at least 20 is often sufficient).
 2) HYPOTHESIS
- H0: $\pi = \pi_0$
- Ha: $\pi \ne \pi_0$ (two-sided) (one-sided alternatives are also possible).
 3) TEST STATISTIC
- If H0 is true, the test statistic is the z-score:
 $z = \frac{\hat{\pi} - \pi0}{\sqrt{\pi0(1 - \pi0)/n}}$ where $\hat{\pi}$ is the sample proportion and $se0 = \sqrt{\pi0(1 - \pi0)/n}$ .
 4) P-VALUE
- The P-value is the two-tail probability under the normal distribution.
 5) CONCLUSION
- Reject H0 if $P \le \alpha$ ; otherwise do not reject H0.
Ex.4 (Florida poll, 2006)
- Question: Should we conclude that those favoring raising taxes are the majority?
- Data: random sample of $n = 1200$ ; 52% favored raising taxes, 48% favored reducing services; goal: test whether the population proportion is above 0.50.
- Setup for two-sided test with $H0: \pi = 0.50$ and $Ha: \pi \ne 0.50$ ; observed $\hat{\pi} = 0.52$ (if the sample had been 0.52 with n = 1200):
- Se under H0: $se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{(0.5)(0.5)/1200} = 0.0144$
- Test statistic: $z = \frac{0.52 - 0.50}{0.0144} = 1.39$
- P-value: $P = 2 \cdot P(Z \ge 1.39) \approx 0.16$
- Conclusion: With $\alpha = 0.05$ , cannot reject H0; cannot determine if those favoring raise taxes are the majority.
- If n = 4800: se0 = 0.0072; z $\approx 2.77$ ; P-value $\approx 0.006$ ; H0 rejected; indicates majority in favor of tax raise in that large sample.
- Confidence interval note: even when H0 is rejected, the 95% CI for $\pi$ is approximately $(0.506, 0.534)$ , showing the estimate is near 0.50 despite rejection.

4.4 Decisions and Types of Errors in Tests

Rejection region
- The collection of test statistic values for which H0 is rejected constitutes the rejection region.
Type I and Type II errors
- Type I error: reject H0 when H0 is true; probability equals the $\alpha$ -level of the test.
- Type II error: fail to reject H0 when H0 is false.
Ex.5 (criminal trial analogy)
- Let H0 represent innocence and Ha represent guilt.
- Probability of Type I error is the $\alpha$ -level (e.g., often $\le 0.05$ ; sometimes much smaller like 0.001 in high-stakes cases).
- Trade-off: decreasing $\alpha$ reduces Type I error but increases Type II error; balance is contextual (e.g., criminal justice context).

4.5 Limitations on Significance Tests

Statistically significant vs practically significant
- Statistical significance means a high probability that the observed difference reflects a true difference in the population, but does not imply the difference is large or practically important.
- Practical significance considers whether the difference is large enough to affect decisions (market strategies, policy, etc.).
Key points
- With large samples, very small differences can become statistically significant (example: >1000 observations).
- Confidence intervals provide a range of plausible parameter values and help assess practical significance; they complement significance tests.
- P-values can be misleading if reported alone:
- There is always a risk of Type I error at level $\alpha$ .
- Some results may be statistically significant by chance (about $\alpha \times 100\%$ of repetitions).
- The P-value is not the probability that H0 is true; it is the probability of the observed data (or more extreme) given H0.
- Significant results can exaggerate the magnitude of the true effect due to selective reporting of extreme outcomes.

4.6 Small-Sample Test for a Proportion: The Binomial Distribution

Binomial distribution basics
- If outcomes are independent with two categories and equal probability for each category across trials, the number of successes follows a binomial distribution.
- If $\pi$ is the probability of the first category, then the probability of x successes in n trials is:
  $P(X = x) = \binom{n}{x} \pi^{x} (1 - \pi)^{n - x}, \quad x = 0,1,2,\dots,n$
Binomial distribution properties
- Mean: $\mu = n\pi$
- Standard deviation: $\sigma = \sqrt{n\pi(1 - \pi)}$
Ex.7 (two-host example)
- From a pool half male, half female; n = 10; $\pi = 0.5$ ; probability of choosing one or fewer females:
- $P(0) = \binom{10}{0} (0.5)^0 (0.5)^{10} \approx 0.001$
- $P(1) = \binom{10}{1} (0.5)^1 (0.5)^9 \approx 0.01$
- $P(X \le 1) = P(0) + P(1) \approx 0.011$
Ex.8 (Eurobonds familiarity, n = 10, observed 2 yes)
- H0: $\pi = 0.5$ ; Ha: $\pi \ne 0.5$ ; x = 2
- P(0) = 0.001; P(1) = 0.01; P(2) = 0.044; $P(X \le 2) = 0.055$
- Two-sided P-value: $P = 2 \cdot (P(0) + P(1) + P(2)) = 0.11$
- Conclusion: With $\alpha = 0.05$ , fail to reject H0; cannot determine if Italians are more familiar with Eurobonds.

4.8 Practice Problems (Brief Outlines)

Problem (1): Height of 7th graders
- Data: previous five-year mean 146 cm; sample: n = 200, $x\bar{} = 148$ cm, s = 20 cm; $\alpha = 0.05$ .
- Approach: use a t-test for the mean with $t = \frac{\bar{x} - 146}{s/\sqrt{n}} = \frac{148 - 146}{20/\sqrt{200}} = \frac{2}{20/\sqrt{200}} = \frac{2}{1.414} \approx 1.414$ ; df = 199; two-tailed P-value $\approx 0.16$ ; fail to reject H0.
Problem (2): College enrollment proportion
- Observed: $\hat{\pi} = 0.55$ ; n = 200; test against $\pi_0 = 0.60$ ; Ha: not equal to 0.60.
- Approach: $se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{0.60(0.40)/200} \approx 0.0346$ ; $z = (0.55 - 0.60)/0.0346 \approx -1.44$ ; two-tailed P-value $\approx 0.15$ ; fail to reject H0.
Problem (3): Normal mean test
- n = 157; $x\bar{} = 65.12$ ; s = 9; H0: $\mu = 65$ ; $\alpha = 0.01$ .
- $t = \frac{\bar{x} - 65}{s/\sqrt{n}} = \frac{65.12 - 65}{9/\sqrt{157}} \approx \frac{0.12}{0.72} \approx 0.17$ ; df = 156; P-value $\approx 0.87$ ; fail to reject H0.
Problem (4): Die bias test (n = 10, observed 4 threes)
- Null: $p = 1/6$ ; two-sided test; approximate via z-test for proportions:
- $se0 = \sqrt{p0(1 - p_0)/n} = \sqrt{(1/6)(5/6)/10} \approx 0.118$
- $\hat{p} = 4/10 = 0.40$ ; $z = (0.40 - 1/6)/0.118 \approx 1.98$ ; two-tailed P-value $\approx 0.046$ ; reject H0 at $\alpha = 0.05$ ; conclude evidence suggesting bias toward 3 (with caveats about exact binomial testing).
End-of-chapter guidance
- Always consider practical significance in addition to statistical significance.
- Use confidence intervals to assess the magnitude and precision of estimates.
- Be aware of limitations and potential misinterpretations of P-values.

4.7 Chapter Summary

Chapter 4 complements Chapter 3: Chapter 3 focuses on confidence intervals; Chapter 4 focuses on significance tests.
Differences between methods:
- Confidence interval: provides a range of plausible values for the parameter.
- Significance test: assesses whether a specific guess value for the parameter is plausible.
Five-step framework (recap)
1) ASSUMPTIONS: means (quantitative) vs. proportions (categorical); randomization; distribution assumptions; large-sample considerations for proportions.
2) HYPOTHESIS: H0 vs Ha; one-sided or two-sided Ha.
3) TEST STATISTIC: number of standard errors between the estimate and the H0 value (z for large-sample proportions; t for means).
4) P-VALUE: probability of observing the test statistic as extreme as the observed one under H0.
5) CONCLUSION: reject H0 if P $\le \alpha$ ; otherwise do not reject.
Additional notes
- Two-sample tests and Chi-squared tests (for associations) are covered in later chapters.