WK10: Statistical inference: population proportion: Test of significance for a proportion

Significance Test for a Population Proportion

Overview

The significance test for a population proportion involves a structured process to determine if there is enough statistical evidence to reject a null hypothesis about the population proportion p. The procedure follows these general steps:

  1. State the null and alternative hypotheses.
  2. Check the necessary conditions.
  3. Calculate the test statistic.
  4. Determine the P-value.
  5. Make a statistical decision based on the P-value.
  6. State the conclusion in the context of the original question.

Step 1: Hypotheses

  • Parameter of Interest: The population proportion, denoted as p.
  • Null Hypothesis (H0): The null hypothesis posits that the population proportion is equal to a specific value P0, which represents no change or a baseline proportion. Mathematically, it is expressed as H0: p = P_0.
  • Alternative Hypothesis (H1): The alternative hypothesis can take one of three forms, depending on the research question:
    • Right-Sided Test: H1: p > P0 (The population proportion has increased.)
    • Left-Sided Test: H1: p < P0 (The population proportion has decreased.)
    • Two-Sided Test: H1: p \neq P0 (The population proportion is different from the null value.)

Examples of Setting Up Hypotheses

  1. Parliamentarian Example:
    • A parliamentarian will vote for a proposal if there's strong evidence that a majority of constituents support it (i.e., more than 50%).
      • H_0: P = 0.5
      • H_1: P > 0.5 (Right-sided test)
  2. Pharmaceutical Company Example:
    • A company aims to prove the proportion of patients experiencing side effects with a new drug is less than 20%.
      • H_0: P = 0.2
      • H_1: P < 0.2 (Left-sided test)
  3. Children Raised by Grandparents Example:
    • In a particular country, 5% of children have historically been raised by grandparents. The question is whether this proportion has changed.
      • H_0: P = 0.05
      • H_1: P \neq 0.05 (Two-sided test)

Step 2: Conditions for Significance Test

Before conducting a significance test for proportions, two conditions must be met:

  1. Random Sample: The sample should be a random sample from the population of interest to ensure it is representative.
  2. Large Sample Size: The sample size should be large enough to ensure the sampling distribution of sample proportions is approximately normal. This is assessed by checking that n Imes P0 \geq 10 and n Imes (1 - P0) \geq 10, where n is the sample size and P_0 is the null hypothesis proportion.

Step 3: Test Statistic

The test statistic for a one-sample proportion test is calculated using the formula:

Z = \frac{\hat{p} - P0}{\sqrt{\frac{P0(1 - P_0)}{n}}}

Where:

  • \hat{p} is the sample proportion.
  • P_0 is the null hypothesis proportion.
  • n is the sample size.

This Z statistic follows a standard normal distribution.

Step 4: P-value

The P-value is determined based on the alternative hypothesis:

  • Left-Sided Test: The P-value is the area to the left of the Z statistic on the standard normal curve.
  • Right-Sided Test: The P-value is the area to the right of the Z statistic on the standard normal curve.
  • Two-Sided Test: The P-value is twice the area in the tail beyond the Z statistic. If Z is negative, it’s twice the area to the left; if Z is positive, it’s twice the area to the right.

Step 5: Statistical Decision

  • If the P-value is less than or equal to the significance level (\alpha), reject the null hypothesis. This indicates statistically significant evidence supporting the alternative hypothesis.
  • If the P-value is greater than the significance level (\alpha), fail to reject the null hypothesis. This indicates insufficient evidence to support the alternative hypothesis.

Step 6: Conclusion

State the conclusion in the context of the original research question, explaining whether there is enough evidence to support the alternative hypothesis.

Example: Purebred Peas

  • Scenario: Analyzing smooth and wrinkled peas from the second generation (F2) to test if the data agrees with the 75% dominant trait conclusion at a significance level of 0.05.
  • Data: 5474 smooth peas and 850 wrinkled peas, totaling 6324 peas.
  • Calculations:
    • Sample proportion (\hat{p}) = 5474 / 7324 ≈ 0.7474.
    • Null hypothesis: H_0: P = 0.75.
    • Alternative hypothesis: H_1: P \neq 0.75.
  • Test Statistic: Z = \frac{0.7474 - 0.75}{\sqrt{\frac{0.75(1 - 0.75)}{7324}}} \approx -0.51
  • P-value: For a two-sided test, the P-value is 2 \tImes 0.305 = 0.61
  • Conclusion: Since the P-value (0.61) is greater than \alpha (0.05), we fail to reject the null hypothesis. There is no significant evidence at the 5% level that the proportion of the dominant trait differs from 75%.

Effect of Sample Size on Statistical Significance

  • Increasing the sample size (n) reduces the standard error in the denominator of the test statistic formula.
  • This results in a larger test statistic and a smaller P-value, making it easier to reject the null hypothesis.
  • However, statistical significance does not always imply practical importance. A very large sample size might lead to statistically significant results for small, unimportant differences.

Example: Drink Preference

  • Scenario: Determining if people prefer drink A over drink B.
  • Hypotheses:
    • H_0: P = 0.5
    • H_1: P \neq 0.5
  • Sample 1: n = 60, \hat{p} = 0.55. Produces a P-value = 0.4412, fail to reject the null hypothesis
  • Sample 2: n = 960, \hat{p} = 0.55. Produces a P-value = 0.0018, reject the null hypothesis.
  • Conclusion: A larger sample size can lead to statistically significant results even if the sample proportion is not substantially different from the null hypothesis proportion. Consider if the change in proportion is practically important.

Summary of Conditions

  1. Random sample from the population.
  2. Use \hat{p} as a point estimate only under the null hypothesis.
  3. The number of successes (n \tImes P0) and the number of failures (n \tImes (1 - P0)) must both be at least ten.