Week 7: Single Population Hypothesis Testing Notes

Learning Outcomes and Course Objectives

  • LO1: Develop an understanding of the concepts of point estimation.

  • LO2: Understand the methodology used to estimate the population mean (μ\mu).

  • LO3: Understand the methodology used to estimate the population proportion (pp).

  • LO4: Understand the fundamental basics of hypothesis testing.

Overview of Weekly Content

  • Application of the principles of sampling distributions to estimate population parameters.

  • Understanding the basic steps involved in Hypothesis Testing.

  • Specific application of hypothesis testing for a population proportion (pp).

  • Specific application of hypothesis testing for a population mean (μ\mu).

Case Study: Regional vs. Urban Migration and Wages

This week uses a specific researcher case study to illustrate hypothesis testing principles:

  • Background: A researcher finds that approximately 23.4%23.4\% of working people previously lived in regional Australia. It is believed this proportion has increased following the COVID-19 pandemic.

  • Claims: There are claims that pandemic disruptions and restrictions were greatest in capital cities, while job growth was strongest in other (regional) areas.

  • National Data: The national annual wage for 2023 is estimated to be AU$98,176\text{AU\$98,176}.

  • Population of Interest: All working people in Australia.

  • Sample Characteristics: To test these claims, the researcher collects a random sample of n=1,000n = 1,000 individuals working in 2023.

  • Variables Recorded:

    • Region: Coded as 11 if the individual lives in regional Australia and 00 if they live in an urban area.

    • Wage: Annual wage recorded in dollars.

Summary Statistics and Data Description

The following table summarizes the data collected from the sample of n=1,000n = 1,000:

Statistic

Wage_All

Wage_Urban

Wage_Regional

No_People_Regional

Mean

98,108.298,108.2

100,438.3100,438.3

90,927.890,927.8

0.270.27

Median

97,801.497,801.4

100,542.8100,542.8

91,643.891,643.8

Standard Deviation

25,711.625,711.6

26,198.826,198.8

22,736.922,736.9

Sample Variance

661,084,628.0661,084,628.0

686,377,080.2686,377,080.2

516,967,633.9516,967,633.9

Range

188,796.7188,796.7

188,874.0188,874.0

119,487.1119,487.1

Minimum

14,760.414,760.4

14,514.914,514.9

31,059.231,059.2

Maximum

203,557.1203,557.1

203,388.9203,388.9

150,546.3150,546.3

n

1,0001,000

730730

270270

270270

From Sampling Distribution to Estimation

Sample statistics serve as reliable and consistent estimators for population parameters:

  • Sample Mean ( Xˉ\bar{X} ):

    • It is an unbiased estimator: E(Xˉ)=μE(\bar{X}) = \mu.

    • Variance: var(Xˉ)=σ2n\text{var}(\bar{X}) = \frac{\sigma^2}{n}.

    • Consistency: The variance converges toward zero as the sample size (nn) grows.

  • Sample Proportion ( p^\hat{p} ):

    • It is an unbiased estimator: E(p^)=pE(\hat{p}) = p.

    • Variance: var(p^)=p(1p)n\text{var}(\hat{p}) = \frac{p(1-p)}{n}.

    • Consistency: The variance converges toward zero as the sample size (nn) grows.

Steps of Hypothesis Testing

  1. State the Hypotheses: Define the Null (H0H_0) and Alternative (HAH_A) hypotheses about a population parameter based on the research claim (not the sample statistic).

  2. Specify the Decision Rule: Define the criteria for rejecting the null hypothesis, including the significance level (α\alpha) and the critical value or p-value.

  3. Calculate the Test Statistic: Compare the sample estimate with the hypothesized value. Identify the appropriate distribution (e.g., t,n1t, n - 1).

  4. Apply the Decision Rule: Based on the evidence and the threshold, decide to either reject or not reject the null hypothesis.

  5. Make the Decision and Draw Conclusions: Interpret the statistical result within the context of the real-world problem.

Hypothesis Testing: Population Proportions

Step 1: Set up the Hypothesis
  • Null Hypothesis (H0H_0): The statement assumed true. For proportion $p$, it is typically set as equality to a value.

    • H0:p=0.234H_0: p = 0.234 (The proportion is the same as the pre-pandemic level).

  • Alternative Hypothesis (HAH_A): The statement considered if H0H_0 is rejected.

    • Upper-tail Test: H_A: p > 0.234 (Used to test if the proportion has increased).

    • Lower-tail Test: H_A: p < 0.234 (Used to test if the proportion has decreased).

    • Two-tail Test: HA:p0.234H_A: p \neq 0.234 (Used to test if the proportion is different).

Step 2: Decision Rule (Critical Value Approach)
  • Significance Level (α\alpha): A predetermined threshold (e.g., 0.01,0.05,0.100.01, 0.05, 0.10).

  • Example calculation: For α=0.05\alpha = 0.05 and df=n1=999df = n - 1 = 999, the critical value using Excel is t.INV(0.95, 999) = 1.645.

  • Decision Rule: Reject H0H_0 if the test statistic > 1.645.

Step 3: Calculate the Test Statistic
  • The test statistic for a population proportion is: t=p^p01np^(1p^)t = \frac{\hat{p} - p_0}{\sqrt{\frac{1}{n}\hat{p}(1 - \hat{p})}}

  • Where:

    • p^\hat{p} is the sample proportion (2701000=0.270\frac{270}{1000} = 0.270).

    • p0p_0 is the hypothesized proportion (0.2340.234).

    • The standard error is se(p^)=110000.270(10.270)=0.0140\text{se}(\hat{p}) = \sqrt{\frac{1}{1000}0.270(1 - 0.270)} = 0.0140.

  • Calculation: t=0.2700.2340.0140=2.564t = \frac{0.270 - 0.234}{0.0140} = 2.564

Step 4 & 5: Decision and Conclusion
  • Since 2.564 > 1.645, the test statistic falls in the rejection region.

  • Decision: Reject H0H_0.

  • Conclusion: At a 5%5\% significance level, the sample provides sufficient evidence that the proportion of the population living in regional areas has increased from the pre-covid proportion of 23.4%23.4\%.

Hypothesis Testing Using the P-value

  • P-value Definition: If the null hypothesis is true, the p-value is the probability of observing a statistic as extreme as the sample proportion obtained.

  • Decision Rule: Reject H0H_0 if p\text{-value} < \alpha.

  • Calculation (Upper-tail): p\text{-value} = P(t_{n-1} > t).

  • Example Calculation: Using the previous proportion test (t=2.564t = 2.564 and df=999df = 999):

    • Excel: = 1 - T.DIST(2.564, 999, TRUE) = 0.0052.

  • Since 0.0052 < 0.05, the decision is to Reject H0H_0.

Hypothesis Testing: Population Mean (Two-tailed Test)

Scenario: Is the average wage in urban areas different from $100,000\text{\$100,000}?
  • Hypotheses:

    • H0:μ2=100,000H_0: \mu_2 = 100,000

    • HA:μ2100,000H_A: \mu_2 \neq 100,000

  • Parameters: E(X2)=μ2E(X_2) = \mu_2, Var(X2)=σ22\text{Var}(X_2) = \sigma_2^2.

Test Statistic and Distribution
  • Formula: t=Xˉ2μ0se(Xˉ2)t = \frac{\bar{X}_2 - \mu_0}{\text{se}(\bar{X}_2)} where se(Xˉ2)=sn2\text{se}(\bar{X}_2) = \frac{s}{\sqrt{n_2}}.

  • Sample data for urban workers: n2=730n_2 = 730, Xˉ2=100,438.3\bar{X}_2 = 100,438.3, s=26,198.80s = 26,198.80.

  • Calculation:

    • se(Xˉ2)=26,198.80730=969.6611\text{se}(\bar{X}_2) = \frac{26,198.80}{\sqrt{730}} = 969.6611

    • t=100,438.3100,000969.6611=0.4520t = \frac{100,438.3 - 100,000}{969.6611} = 0.4520

  • Distribution: follows t(n1)=t(729)t(n-1) = t(729).

Decision Rule and Decision
  • Critical Value Approach: For α=0.05\alpha = 0.05, the critical values are ±1.963\pm 1.963. Reject if |t| > 1.963.

  • P-value Approach: p\text{-value} = 2 \times P(t_{729} > 0.452) = 0.6514 (calculated via Excel T.DIST.2T(ABS(0.452), 729)).

  • Result: 0.4520 < 1.963, and 0.6514 > 0.05.

  • Decision: Do Not Reject H0H_0.

  • Conclusion: At a 5%5\% significance level, there is insufficient evidence to conclude the average wage in urban areas is different from $100,000\text{\$100,000}.

Hypothesis Testing: Population Mean (Lower-tail Test)

Scenario: Is the average wage in regional areas less than $100,000\text{\$100,000}?
  • Hypotheses:

    • H0:μ1=100,000H_0: \mu_1 = 100,000

    • H_A: \mu_1 < 100,000

  • Parameters: n1=270n_1 = 270, Xˉ1=90,927.84\bar{X}_1 = 90,927.84, s1=22,736.92s_1 = 22,736.92.

Test Statistic calculation
  • t=90,927.84100,00022,736.92270=6.55t = \frac{90,927.84 - 100,000}{\frac{22,736.92}{\sqrt{270}}} = -6.55

Decision Rule and Decision
  • Critical Value: For α=0.05\alpha = 0.05 and df=269df = 269, Critical Value = -1.6505 (using Excel t.INV(0.05, 269)).

  • Rule: Reject H0H_0 if t < -1.6505.

  • Decision: Since -6.55 < -1.6505, Reject H0H_0.

  • P-value: P(t_{269} < -6.55) \approx 0.0000, which is less than 0.050.05.

  • Conclusion: At a 5%5\% significance level, there is sufficient evidence that the mean wage of working people in regional areas is less than $100,000\text{\$100,000}.