Chapter 4 Notes: Significance Tests — Five-Part Framework, Means, Proportions, Errors, Limitations, Binomial, Practice

4.0 Introduction

  • The goal of many studies: check whether data agree with predicted values (hypotheses about a population).

  • Predictions or guess values drive the theory; significance tests measure the strength of evidence sample data provide for or against a specific hypothesis.

  • They compare point estimates of parameters to predicted values from the hypothesis.

  • Five-step significance tests can be applied to a mean and to a proportion; limitations and remarks follow.

4.1 The Five Parts of a Significance Test

  • Hypothesis and significance test

    • A hypothesis is a statement about a population parameter (mean, proportion, etc.).

    • Null hypothesis H0: parameter takes a particular value.

    • Alternative hypothesis Ha (research hypothesis): the parameter falls in some alternative range.

    • Usually, H0 corresponds to "no effect"; Ha represents an effect or difference.

    • Examples to illustrate questions:

    • Ex.1: A supermarket training program and whether females are disproportionately selected compared with males for training.

    • Ex.2: Bob

's claim that the USA proportion of adults who favor legalized drugs equals 0.50.

  • The five parts of a significance test:

    1) ASSUMPTIONS

    • Type of data: quantitative vs. categorical.

    • Randomization.

    • Population distribution (normality for means is assumed; large-sample normality for proportions).

    • Sample size: larger samples improve validity for many tests.

      2) HYPOTHESIS

    • Null hypothesis H0 and alternative Ha.

      3) TEST STATISTIC

    • Summarizes how far the point estimate falls from the H0 value (number of standard errors between the estimate and the H0 value).

      4) P-VALUE

    • The probability, under H0, that the test statistic would be as extreme as or more extreme than the observed value, in the direction(s) specified by Ha.

    • Smaller P-value

→ stronger evidence against H0.

5) CONCLUSION

  • Interpret the P-value.

  • Decision rule: reject H0 if P

\le \alpha (pre-specified level, e.g., \alpha = 0.05 or \alpha = 0.01); do not reject H0 if P > \alpha.

  • Remarks

    • The significance level \alpha is the probability of rejecting H0 when H0 is true; choose \alpha before analyzing data.

    • Hypotheses always refer to population parameters, not sample statistics (samples have uncertainty, not the population).

4.2 Significance Test for a Mean

  • Significance test for a mean (five parts):

    1) ASSUMPTIONS

    • Type of data: quantitative data.

    • Randomization.

    • Normal population distribution.

    • Sample-size: particularly relevant for small samples.

      2) HYPOTHESIS

    • H0: \mu = \mu_0

    • Ha: two-sided \mu \ne \mu_0 (two-sided) or one-sided in extensions.

      3) TEST STATISTIC

    • If H0 is true, the center of the sampling distribution of \bar{y} is \mu_0.

    • Test statistic: t = \frac{\bar{y} - \mu_0}{s/\sqrt{n}} - Here, se = \frac{s}{\sqrt{n}} (note: the source text contains a typo showing se = s \sqrt{n}; the correct standard error is s/\sqrt{n}).

    • The sampling distribution of \bar{y} under H0 is a t-distribution with degrees of freedom df = n - 1.

      4) P-VALUE

    • Under H0, compute the P-value as the two-tailed probability that the t-statistic is at least as large in absolute value as the observed value: P = 2\cdot P\left(T{df} \ge |t{obs}|\right).

      5) CONCLUSION

    • Smaller P-value

→ stronger evidence against H0 and in favor of Ha.

  • Alternative hypothesis forms

    • Two-sided: Ha: \mu \ne \mu_0 (P-value is the two-tail probability).

    • One-sided: Ha: \mu > \mu0 (P-value is the right-tail probability) or Ha: \mu < \mu0 (left-tail probability).

    • Two-sided tests are more common; context may justify one-sided tests (e.g., mean has changed vs. mean has decreased).

  • Remark: If the population standard deviation \sigma is known and n \ge 30, the Central Limit Theorem allows using the z-score instead of t-score: z = \frac{\bar{y} - \mu_0}{\sigma/\sqrt{n}}.

  • Example Ex.3 (7-point scale data)

    • Data: seven-point scale with counts: Extremes liberal (10), Liberal (21), Slightly liberal (22), Moderate (74), Slightly conservative (21), Conservative (27), Extremely conservative (11); total n = 186.

    • Goal: test whether the population mean is moderate.

    • Assumptions: quantitative (seven-point scale); randomization; normal population distribution; large sample.

    • Hypotheses: H0: \mu = 4 (moderate on a 7-point scale) vs. Ha: \mu \ne 4.

    • Test procedures: compute sample mean, standard deviation, and perform t-test with appropriate se = \frac{s}{\sqrt{n}}; evaluate P-value and conclusion (specific numeric results not provided in the transcript).

  • Figures and interpretation notes

    • Figures illustrate two-sided vs. one-sided alternatives (conceptual).

    • If \sigma is known and n large, z-test is an alternative to t-test.

4.3 Significance Test for a Proportion

  • Five parts (for a proportion):

    1) ASSUMPTIONS

    • Type of data: categorical data.

    • Randomization.

    • Normal sampling distribution (e.g., if H0: \pi = \pi0, a sample size of at least 20 is often sufficient).

      2) HYPOTHESIS

    • H0: \pi = \pi_0

    • Ha: \pi \ne \pi_0 (two-sided) (one-sided alternatives are also possible).

      3) TEST STATISTIC

    • If H0 is true, the test statistic is the z-score:

      z = \frac{\hat{\pi} - \pi0}{\sqrt{\pi0(1 - \pi0)/n}} where \hat{\pi} is the sample proportion and se0 = \sqrt{\pi0(1 - \pi0)/n}.

      4) P-VALUE

    • The P-value is the two-tail probability under the normal distribution.

      5) CONCLUSION

    • Reject H0 if P \le \alpha; otherwise do not reject H0.

  • Ex.4 (Florida poll, 2006)

    • Question: Should we conclude that those favoring raising taxes are the majority?

    • Data: random sample of n = 1200; 52% favored raising taxes, 48% favored reducing services; goal: test whether the population proportion is above 0.50.

    • Setup for two-sided test with H0: \pi = 0.50 and Ha: \pi \ne 0.50; observed \hat{\pi} = 0.52 (if the sample had been 0.52 with n = 1200):

    • Se under H0: se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{(0.5)(0.5)/1200} = 0.0144

    • Test statistic: z = \frac{0.52 - 0.50}{0.0144} = 1.39

    • P-value: P = 2 \cdot P(Z \ge 1.39) \approx 0.16

    • Conclusion: With \alpha = 0.05, cannot reject H0; cannot determine if those favoring raise taxes are the majority.

    • If n = 4800: se0 = 0.0072; z \approx 2.77; P-value \approx 0.006; H0 rejected; indicates majority in favor of tax raise in that large sample.

    • Confidence interval note: even when H0 is rejected, the 95% CI for \pi is approximately (0.506, 0.534), showing the estimate is near 0.50 despite rejection.

4.4 Decisions and Types of Errors in Tests

  • Rejection region

    • The collection of test statistic values for which H0 is rejected constitutes the rejection region.

  • Type I and Type II errors

    • Type I error: reject H0 when H0 is true; probability equals the \alpha-level of the test.

    • Type II error: fail to reject H0 when H0 is false.

  • Ex.5 (criminal trial analogy)

    • Let H0 represent innocence and Ha represent guilt.

    • Probability of Type I error is the \alpha-level (e.g., often \le 0.05; sometimes much smaller like 0.001 in high-stakes cases).

    • Trade-off: decreasing \alpha reduces Type I error but increases Type II error; balance is contextual (e.g., criminal justice context).

4.5 Limitations on Significance Tests

  • Statistically significant vs practically significant

    • Statistical significance means a high probability that the observed difference reflects a true difference in the population, but does not imply the difference is large or practically important.

    • Practical significance considers whether the difference is large enough to affect decisions (market strategies, policy, etc.).

  • Key points

    • With large samples, very small differences can become statistically significant (example: >1000 observations).

    • Confidence intervals provide a range of plausible parameter values and help assess practical significance; they complement significance tests.

    • P-values can be misleading if reported alone:

    • There is always a risk of Type I error at level \alpha.

    • Some results may be statistically significant by chance (about \alpha \times 100\% of repetitions).

    • The P-value is not the probability that H0 is true; it is the probability of the observed data (or more extreme) given H0.

    • Significant results can exaggerate the magnitude of the true effect due to selective reporting of extreme outcomes.

4.6 Small-Sample Test for a Proportion: The Binomial Distribution

  • Binomial distribution basics

    • If outcomes are independent with two categories and equal probability for each category across trials, the number of successes follows a binomial distribution.

    • If \pi is the probability of the first category, then the probability of x successes in n trials is:

      P(X = x) = \binom{n}{x} \pi^{x} (1 - \pi)^{n - x}, \quad x = 0,1,2,\dots,n

  • Binomial distribution properties

    • Mean: \mu = n\pi

    • Standard deviation: \sigma = \sqrt{n\pi(1 - \pi)}

  • Ex.7 (two-host example)

    • From a pool half male, half female; n = 10; \pi = 0.5; probability of choosing one or fewer females:

    • P(0) = \binom{10}{0} (0.5)^0 (0.5)^{10} \approx 0.001

    • P(1) = \binom{10}{1} (0.5)^1 (0.5)^9 \approx 0.01

    • P(X \le 1) = P(0) + P(1) \approx 0.011

  • Ex.8 (Eurobonds familiarity, n = 10, observed 2 yes)

    • H0: \pi = 0.5; Ha: \pi \ne 0.5; x = 2

    • P(0) = 0.001; P(1) = 0.01; P(2) = 0.044; P(X \le 2) = 0.055

    • Two-sided P-value: P = 2 \cdot (P(0) + P(1) + P(2)) = 0.11

    • Conclusion: With \alpha = 0.05, fail to reject H0; cannot determine if Italians are more familiar with Eurobonds.

4.8 Practice Problems (Brief Outlines)

  • Problem (1): Height of 7th graders

    • Data: previous five-year mean 146 cm; sample: n = 200, x\bar{} = 148 cm, s = 20 cm; \alpha = 0.05.

    • Approach: use a t-test for the mean with t = \frac{\bar{x} - 146}{s/\sqrt{n}} = \frac{148 - 146}{20/\sqrt{200}} = \frac{2}{20/\sqrt{200}} = \frac{2}{1.414} \approx 1.414; df = 199; two-tailed P-value \approx 0.16; fail to reject H0.

  • Problem (2): College enrollment proportion

    • Observed: \hat{\pi} = 0.55; n = 200; test against \pi_0 = 0.60; Ha: not equal to 0.60.

    • Approach: se0 = \sqrt{\pi0(1 - \pi_0)/n} = \sqrt{0.60(0.40)/200} \approx 0.0346; z = (0.55 - 0.60)/0.0346 \approx -1.44; two-tailed P-value \approx 0.15; fail to reject H0.

  • Problem (3): Normal mean test

    • n = 157; x\bar{} = 65.12; s = 9; H0: \mu = 65; \alpha = 0.01.

    • t = \frac{\bar{x} - 65}{s/\sqrt{n}} = \frac{65.12 - 65}{9/\sqrt{157}} \approx \frac{0.12}{0.72} \approx 0.17; df = 156; P-value \approx 0.87; fail to reject H0.

  • Problem (4): Die bias test (n = 10, observed 4 threes)

    • Null: p = 1/6; two-sided test; approximate via z-test for proportions:

    • se0 = \sqrt{p0(1 - p_0)/n} = \sqrt{(1/6)(5/6)/10} \approx 0.118

    • \hat{p} = 4/10 = 0.40; z = (0.40 - 1/6)/0.118 \approx 1.98; two-tailed P-value \approx 0.046; reject H0 at \alpha = 0.05; conclude evidence suggesting bias toward 3 (with caveats about exact binomial testing).

  • End-of-chapter guidance

    • Always consider practical significance in addition to statistical significance.

    • Use confidence intervals to assess the magnitude and precision of estimates.

    • Be aware of limitations and potential misinterpretations of P-values.

4.7 Chapter Summary

  • Chapter 4 complements Chapter 3: Chapter 3 focuses on confidence intervals; Chapter 4 focuses on significance tests.

  • Differences between methods:

    • Confidence interval: provides a range of plausible values for the parameter.

    • Significance test: assesses whether a specific guess value for the parameter is plausible.

  • Five-step framework (recap)

    1) ASSUMPTIONS: means (quantitative) vs. proportions (categorical); randomization; distribution assumptions; large-sample considerations for proportions.

    2) HYPOTHESIS: H0 vs Ha; one-sided or two-sided Ha.

    3) TEST STATISTIC: number of standard errors between the estimate and the H0 value (z for large-sample proportions; t for means).

    4) P-VALUE: probability of observing the test statistic as extreme as the observed one under H0.

    5) CONCLUSION: reject H0 if P \le \alpha; otherwise do not reject.

  • Additional notes

    • Two-sample tests and Chi-squared tests (for associations) are covered in later chapters.