Chapter 4 Notes: Significance Tests — Five-Part Framework, Means, Proportions, Errors, Limitations, Binomial, Practice
4.0 Introduction
The goal of many studies: check whether data agree with predicted values (hypotheses about a population).
Predictions or guess values drive the theory; significance tests measure the strength of evidence sample data provide for or against a specific hypothesis.
They compare point estimates of parameters to predicted values from the hypothesis.
Five-step significance tests can be applied to a mean and to a proportion; limitations and remarks follow.
4.1 The Five Parts of a Significance Test
Hypothesis and significance test
A hypothesis is a statement about a population parameter (mean, proportion, etc.).
Null hypothesis H0: parameter takes a particular value.
Alternative hypothesis Ha (research hypothesis): the parameter falls in some alternative range.
Usually, H0 corresponds to "no effect"; Ha represents an effect or difference.
Examples to illustrate questions:
Ex.1: A supermarket training program and whether females are disproportionately selected compared with males for training.
Ex.2: Bob
's claim that the USA proportion of adults who favor legalized drugs equals 0.50.
The five parts of a significance test:
1) ASSUMPTIONS
Type of data: quantitative vs. categorical.
Randomization.
Population distribution (normality for means is assumed; large-sample normality for proportions).
Sample size: larger samples improve validity for many tests.
2) HYPOTHESIS
Null hypothesis H0 and alternative Ha.
3) TEST STATISTIC
Summarizes how far the point estimate falls from the H0 value (number of standard errors between the estimate and the H0 value).
4) P-VALUE
The probability, under H0, that the test statistic would be as extreme as or more extreme than the observed value, in the direction(s) specified by Ha.
Smaller P-value
→ stronger evidence against H0.
5) CONCLUSION
Interpret the P-value.
Decision rule: reject H0 if P
\le \alpha (pre-specified level, e.g., \alpha = 0.05 or \alpha = 0.01); do not reject H0 if P > \alpha.
Remarks
The significance level \alpha is the probability of rejecting H0 when H0 is true; choose \alpha before analyzing data.
Hypotheses always refer to population parameters, not sample statistics (samples have uncertainty, not the population).
4.2 Significance Test for a Mean
Significance test for a mean (five parts):
1) ASSUMPTIONS
Type of data: quantitative data.
Randomization.
Normal population distribution.
Sample-size: particularly relevant for small samples.
2) HYPOTHESIS
H0: \mu = \mu_0
Ha: two-sided \mu \ne \mu_0 (two-sided) or one-sided in extensions.
3) TEST STATISTIC
If H0 is true, the center of the sampling distribution of \bar{y} is \mu_0.
Test statistic: t = \frac{\bar{y} - \mu_0}{s/\sqrt{n}} - Here, se = \frac{s}{\sqrt{n}} (note: the source text contains a typo showing se = s \sqrt{n}; the correct standard error is s/\sqrt{n}).
The sampling distribution of \bar{y} under H0 is a t-distribution with degrees of freedom df = n - 1.
4) P-VALUE
Under H0, compute the P-value as the two-tailed probability that the t-statistic is at least as large in absolute value as the observed value: P = 2\cdot P\left(T{df} \ge |t{obs}|\right).
5) CONCLUSION
Smaller P-value
→ stronger evidence against H0 and in favor of Ha.
Alternative hypothesis forms
Two-sided: Ha: \mu \ne \mu_0 (P-value is the two-tail probability).
One-sided: Ha: \mu > \mu0 (P-value is the right-tail probability) or Ha: \mu < \mu0 (left-tail probability).
Two-sided tests are more common; context may justify one-sided tests (e.g., mean has changed vs. mean has decreased).
Remark: If the population standard deviation \sigma is known and n \ge 30, the Central Limit Theorem allows using the z-score instead of t-score: z = \frac{\bar{y} - \mu_0}{\sigma/\sqrt{n}}.
Example Ex.3 (7-point scale data)
Data: seven-point scale with counts: Extremes liberal (10), Liberal (21), Slightly liberal (22), Moderate (74), Slightly conservative (21), Conservative (27), Extremely conservative (11); total n = 186.
Goal: test whether the population mean is moderate.
Assumptions: quantitative (seven-point scale); randomization; normal population distribution; large sample.
Hypotheses: H0: \mu = 4 (moderate on a 7-point scale) vs. Ha: \mu \ne 4.
Test procedures: compute sample mean, standard deviation, and perform t-test with appropriate se = \frac{s}{\sqrt{n}}; evaluate P-value and conclusion (specific numeric results not provided in the transcript).
Figures and interpretation notes
Figures illustrate two-sided vs. one-sided alternatives (conceptual).
If \sigma is known and n large, z-test is an alternative to t-test.
4.3 Significance Test for a Proportion
Five parts (for a proportion):
1) ASSUMPTIONS
Type of data: categorical data.
Randomization.
Normal sampling distribution (e.g., if H0: \pi = \pi0, a sample size of at least 20 is often sufficient).
2) HYPOTHESIS
H0: \pi = \pi_0
Ha: \pi \ne \pi_0 (two-sided) (one-sided alternatives are also possible).
3) TEST STATISTIC
If H0 is true, the test statistic is the z-score:
z = \frac{\hat{\pi} - \pi0}{\sqrt{\pi0(1 - \pi0)/n}} where \hat{\pi} is the sample proportion and se0 = \sqrt{\pi0(1 - \pi0)/n}.
4) P-VALUE
The P-value is the two-tail probability under the normal distribution.
5) CONCLUSION
Reject H0 if P \le \alpha; otherwise do not reject H0.
Ex.4 (Florida poll, 2006)
Question: Should we conclude that those favoring raising taxes are the majority?
Data: random sample of n = 1200; 52% favored raising taxes, 48% favored reducing services; goal: test whether the population proportion is above 0.50.
Setup for two-sided test with H0: \pi = 0.50 and Ha: \pi \ne 0.50; observed \hat{\pi} = 0.52 (if the sample had been 0.52 with n = 1200):
Se under H0: se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{(0.5)(0.5)/1200} = 0.0144
Test statistic: z = \frac{0.52 - 0.50}{0.0144} = 1.39
P-value: P = 2 \cdot P(Z \ge 1.39) \approx 0.16
Conclusion: With \alpha = 0.05, cannot reject H0; cannot determine if those favoring raise taxes are the majority.
If n = 4800: se0 = 0.0072; z \approx 2.77; P-value \approx 0.006; H0 rejected; indicates majority in favor of tax raise in that large sample.
Confidence interval note: even when H0 is rejected, the 95% CI for \pi is approximately (0.506, 0.534), showing the estimate is near 0.50 despite rejection.
4.4 Decisions and Types of Errors in Tests
Rejection region
The collection of test statistic values for which H0 is rejected constitutes the rejection region.
Type I and Type II errors
Type I error: reject H0 when H0 is true; probability equals the \alpha-level of the test.
Type II error: fail to reject H0 when H0 is false.
Ex.5 (criminal trial analogy)
Let H0 represent innocence and Ha represent guilt.
Probability of Type I error is the \alpha-level (e.g., often \le 0.05; sometimes much smaller like 0.001 in high-stakes cases).
Trade-off: decreasing \alpha reduces Type I error but increases Type II error; balance is contextual (e.g., criminal justice context).
4.5 Limitations on Significance Tests
Statistically significant vs practically significant
Statistical significance means a high probability that the observed difference reflects a true difference in the population, but does not imply the difference is large or practically important.
Practical significance considers whether the difference is large enough to affect decisions (market strategies, policy, etc.).
Key points
With large samples, very small differences can become statistically significant (example: >1000 observations).
Confidence intervals provide a range of plausible parameter values and help assess practical significance; they complement significance tests.
P-values can be misleading if reported alone:
There is always a risk of Type I error at level \alpha.
Some results may be statistically significant by chance (about \alpha \times 100\% of repetitions).
The P-value is not the probability that H0 is true; it is the probability of the observed data (or more extreme) given H0.
Significant results can exaggerate the magnitude of the true effect due to selective reporting of extreme outcomes.
4.6 Small-Sample Test for a Proportion: The Binomial Distribution
Binomial distribution basics
If outcomes are independent with two categories and equal probability for each category across trials, the number of successes follows a binomial distribution.
If \pi is the probability of the first category, then the probability of x successes in n trials is:
P(X = x) = \binom{n}{x} \pi^{x} (1 - \pi)^{n - x}, \quad x = 0,1,2,\dots,n
Binomial distribution properties
Mean: \mu = n\pi
Standard deviation: \sigma = \sqrt{n\pi(1 - \pi)}
Ex.7 (two-host example)
From a pool half male, half female; n = 10; \pi = 0.5; probability of choosing one or fewer females:
P(0) = \binom{10}{0} (0.5)^0 (0.5)^{10} \approx 0.001
P(1) = \binom{10}{1} (0.5)^1 (0.5)^9 \approx 0.01
P(X \le 1) = P(0) + P(1) \approx 0.011
Ex.8 (Eurobonds familiarity, n = 10, observed 2 yes)
H0: \pi = 0.5; Ha: \pi \ne 0.5; x = 2
P(0) = 0.001; P(1) = 0.01; P(2) = 0.044; P(X \le 2) = 0.055
Two-sided P-value: P = 2 \cdot (P(0) + P(1) + P(2)) = 0.11
Conclusion: With \alpha = 0.05, fail to reject H0; cannot determine if Italians are more familiar with Eurobonds.
4.8 Practice Problems (Brief Outlines)
Problem (1): Height of 7th graders
Data: previous five-year mean 146 cm; sample: n = 200, x\bar{} = 148 cm, s = 20 cm; \alpha = 0.05.
Approach: use a t-test for the mean with t = \frac{\bar{x} - 146}{s/\sqrt{n}} = \frac{148 - 146}{20/\sqrt{200}} = \frac{2}{20/\sqrt{200}} = \frac{2}{1.414} \approx 1.414; df = 199; two-tailed P-value \approx 0.16; fail to reject H0.
Problem (2): College enrollment proportion
Observed: \hat{\pi} = 0.55; n = 200; test against \pi_0 = 0.60; Ha: not equal to 0.60.
Approach: se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{0.60(0.40)/200} \approx 0.0346; z = (0.55 - 0.60)/0.0346 \approx -1.44; two-tailed P-value \approx 0.15; fail to reject H0.
Problem (3): Normal mean test
n = 157; x\bar{} = 65.12; s = 9; H0: \mu = 65; \alpha = 0.01.
t = \frac{\bar{x} - 65}{s/\sqrt{n}} = \frac{65.12 - 65}{9/\sqrt{157}} \approx \frac{0.12}{0.72} \approx 0.17; df = 156; P-value \approx 0.87; fail to reject H0.
Problem (4): Die bias test (n = 10, observed 4 threes)
Null: p = 1/6; two-sided test; approximate via z-test for proportions:
se0 = \sqrt{p0(1 - p_0)/n} = \sqrt{(1/6)(5/6)/10} \approx 0.118
\hat{p} = 4/10 = 0.40; z = (0.40 - 1/6)/0.118 \approx 1.98; two-tailed P-value \approx 0.046; reject H0 at \alpha = 0.05; conclude evidence suggesting bias toward 3 (with caveats about exact binomial testing).
End-of-chapter guidance
Always consider practical significance in addition to statistical significance.
Use confidence intervals to assess the magnitude and precision of estimates.
Be aware of limitations and potential misinterpretations of P-values.
4.7 Chapter Summary
Chapter 4 complements Chapter 3: Chapter 3 focuses on confidence intervals; Chapter 4 focuses on significance tests.
Differences between methods:
Confidence interval: provides a range of plausible values for the parameter.
Significance test: assesses whether a specific guess value for the parameter is plausible.
Five-step framework (recap)
1) ASSUMPTIONS: means (quantitative) vs. proportions (categorical); randomization; distribution assumptions; large-sample considerations for proportions.
2) HYPOTHESIS: H0 vs Ha; one-sided or two-sided Ha.
3) TEST STATISTIC: number of standard errors between the estimate and the H0 value (z for large-sample proportions; t for means).
4) P-VALUE: probability of observing the test statistic as extreme as the observed one under H0.
5) CONCLUSION: reject H0 if P \le \alpha; otherwise do not reject.
Additional notes
Two-sample tests and Chi-squared tests (for associations) are covered in later chapters.