Week 7: Single Population Hypothesis Testing Notes

Learning Outcomes and Course Objectives

LO1: Develop an understanding of the concepts of point estimation.
LO2: Understand the methodology used to estimate the population mean ( $\mu$ ).
LO3: Understand the methodology used to estimate the population proportion ( $p$ ).
LO4: Understand the fundamental basics of hypothesis testing.

Overview of Weekly Content

Application of the principles of sampling distributions to estimate population parameters.
Understanding the basic steps involved in Hypothesis Testing.
Specific application of hypothesis testing for a population proportion ( $p$ ).
Specific application of hypothesis testing for a population mean ( $\mu$ ).

Case Study: Regional vs. Urban Migration and Wages

This week uses a specific researcher case study to illustrate hypothesis testing principles:

Background: A researcher finds that approximately $23.4\%$ of working people previously lived in regional Australia. It is believed this proportion has increased following the COVID-19 pandemic.
Claims: There are claims that pandemic disruptions and restrictions were greatest in capital cities, while job growth was strongest in other (regional) areas.
National Data: The national annual wage for 2023 is estimated to be $\text{AU\$98,176}$ .
Population of Interest: All working people in Australia.
Sample Characteristics: To test these claims, the researcher collects a random sample of $n = 1,000$ individuals working in 2023.
Variables Recorded:
- Region: Coded as $1$ if the individual lives in regional Australia and $0$ if they live in an urban area.
- Wage: Annual wage recorded in dollars.

Summary Statistics and Data Description

The following table summarizes the data collected from the sample of $n = 1,000$ :

Statistic	Wage_All	Wage_Urban	Wage_Regional	No_People_Regional
Mean	$98,108.2$	$100,438.3$	$90,927.8$	$0.27$
Median	$97,801.4$	$100,542.8$	$91,643.8$
Standard Deviation	$25,711.6$	$26,198.8$	$22,736.9$
Sample Variance	$661,084,628.0$	$686,377,080.2$	$516,967,633.9$
Range	$188,796.7$	$188,874.0$	$119,487.1$
Minimum	$14,760.4$	$14,514.9$	$31,059.2$
Maximum	$203,557.1$	$203,388.9$	$150,546.3$
n	$1,000$	$730$	$270$	$270$

From Sampling Distribution to Estimation

Sample statistics serve as reliable and consistent estimators for population parameters:

Sample Mean ( $\bar{X}$ ):
- It is an unbiased estimator: $E(\bar{X}) = \mu$ .
- Variance: $\text{var}(\bar{X}) = \frac{\sigma^2}{n}$ .
- Consistency: The variance converges toward zero as the sample size ( $n$ ) grows.
Sample Proportion ( $\hat{p}$ ):
- It is an unbiased estimator: $E(\hat{p}) = p$ .
- Variance: $\text{var}(\hat{p}) = \frac{p(1-p)}{n}$ .
- Consistency: The variance converges toward zero as the sample size ( $n$ ) grows.

Steps of Hypothesis Testing

State the Hypotheses: Define the Null ( $H_0$ ) and Alternative ( $H_A$ ) hypotheses about a population parameter based on the research claim (not the sample statistic).
Specify the Decision Rule: Define the criteria for rejecting the null hypothesis, including the significance level ( $\alpha$ ) and the critical value or p-value.
Calculate the Test Statistic: Compare the sample estimate with the hypothesized value. Identify the appropriate distribution (e.g., $t, n - 1$ ).
Apply the Decision Rule: Based on the evidence and the threshold, decide to either reject or not reject the null hypothesis.
Make the Decision and Draw Conclusions: Interpret the statistical result within the context of the real-world problem.

Hypothesis Testing: Population Proportions

Step 1: Set up the Hypothesis

Null Hypothesis ( $H_0$ ): The statement assumed true. For proportion $p$, it is typically set as equality to a value.
- $H_0: p = 0.234$ (The proportion is the same as the pre-pandemic level).
Alternative Hypothesis ( $H_A$ ): The statement considered if $H_0$ is rejected.
- Upper-tail Test: H_A: p > 0.234 (Used to test if the proportion has increased).
- Lower-tail Test: H_A: p < 0.234 (Used to test if the proportion has decreased).
- Two-tail Test: $H_A: p \neq 0.234$ (Used to test if the proportion is different).

Step 2: Decision Rule (Critical Value Approach)

Significance Level ( $\alpha$ ): A predetermined threshold (e.g., $0.01, 0.05, 0.10$ ).
Example calculation: For $\alpha = 0.05$ and $df = n - 1 = 999$ , the critical value using Excel is t.INV(0.95, 999) = 1.645.
Decision Rule: Reject $H_0$ if the test statistic > 1.645.

Step 3: Calculate the Test Statistic

The test statistic for a population proportion is: $t = \frac{\hat{p} - p_0}{\sqrt{\frac{1}{n}\hat{p}(1 - \hat{p})}}$
Where:
- $\hat{p}$ is the sample proportion ( $\frac{270}{1000} = 0.270$ ).
- $p_0$ is the hypothesized proportion ( $0.234$ ).
- The standard error is $\text{se}(\hat{p}) = \sqrt{\frac{1}{1000}0.270(1 - 0.270)} = 0.0140$ .
Calculation: $t = \frac{0.270 - 0.234}{0.0140} = 2.564$

Step 4 & 5: Decision and Conclusion

Since 2.564 > 1.645, the test statistic falls in the rejection region.
Decision: Reject $H_0$ .
Conclusion: At a $5\%$ significance level, the sample provides sufficient evidence that the proportion of the population living in regional areas has increased from the pre-covid proportion of $23.4\%$ .

Hypothesis Testing Using the P-value

P-value Definition: If the null hypothesis is true, the p-value is the probability of observing a statistic as extreme as the sample proportion obtained.
Decision Rule: Reject $H_0$ if p\text{-value} < \alpha.
Calculation (Upper-tail): p\text{-value} = P(t_{n-1} > t).
Example Calculation: Using the previous proportion test ( $t = 2.564$ and $df = 999$ ):
- Excel: = 1 - T.DIST(2.564, 999, TRUE) = 0.0052.
Since 0.0052 < 0.05, the decision is to Reject $H_0$ .

Hypothesis Testing: Population Mean (Two-tailed Test)

Scenario: Is the average wage in urban areas different from $\text{\$100,000}$ ?

Hypotheses:
- $H_0: \mu_2 = 100,000$
- $H_A: \mu_2 \neq 100,000$
Parameters: $E(X_2) = \mu_2$ , $\text{Var}(X_2) = \sigma_2^2$ .

Test Statistic and Distribution

Formula: $t = \frac{\bar{X}_2 - \mu_0}{\text{se}(\bar{X}_2)}$ where $\text{se}(\bar{X}_2) = \frac{s}{\sqrt{n_2}}$ .
Sample data for urban workers: $n_2 = 730$ , $\bar{X}_2 = 100,438.3$ , $s = 26,198.80$ .
Calculation:
- $\text{se}(\bar{X}_2) = \frac{26,198.80}{\sqrt{730}} = 969.6611$
- $t = \frac{100,438.3 - 100,000}{969.6611} = 0.4520$
Distribution: follows $t(n-1) = t(729)$ .

Decision Rule and Decision

Critical Value Approach: For $\alpha = 0.05$ , the critical values are $\pm 1.963$ . Reject if |t| > 1.963.
P-value Approach: p\text{-value} = 2 \times P(t_{729} > 0.452) = 0.6514 (calculated via Excel T.DIST.2T(ABS(0.452), 729)).
Result: 0.4520 < 1.963, and 0.6514 > 0.05.
Decision: Do Not Reject $H_0$ .
Conclusion: At a $5\%$ significance level, there is insufficient evidence to conclude the average wage in urban areas is different from $\text{\$100,000}$ .

Hypothesis Testing: Population Mean (Lower-tail Test)

Scenario: Is the average wage in regional areas less than $\text{\$100,000}$ ?

Hypotheses:
- $H_0: \mu_1 = 100,000$
- H_A: \mu_1 < 100,000
Parameters: $n_1 = 270$ , $\bar{X}_1 = 90,927.84$ , $s_1 = 22,736.92$ .

Test Statistic calculation

$t = \frac{90,927.84 - 100,000}{\frac{22,736.92}{\sqrt{270}}} = -6.55$

Decision Rule and Decision

Critical Value: For $\alpha = 0.05$ and $df = 269$ , Critical Value = -1.6505 (using Excel t.INV(0.05, 269)).
Rule: Reject $H_0$ if t < -1.6505.
Decision: Since -6.55 < -1.6505, Reject $H_0$ .
P-value: P(t_{269} < -6.55) \approx 0.0000, which is less than $0.05$ .
Conclusion: At a $5\%$ significance level, there is sufficient evidence that the mean wage of working people in regional areas is less than $\text{\$100,000}$ .

Week 7: Single Population Hypothesis Testing Notes

Learning Outcomes and Course Objectives

Overview of Weekly Content

Case Study: Regional vs. Urban Migration and Wages

Summary Statistics and Data Description

From Sampling Distribution to Estimation

Steps of Hypothesis Testing

Hypothesis Testing: Population Proportions

Step 1: Set up the Hypothesis

Step 2: Decision Rule (Critical Value Approach)

Step 3: Calculate the Test Statistic

Step 4 & 5: Decision and Conclusion

Hypothesis Testing Using the P-value

Hypothesis Testing: Population Mean (Two-tailed Test)

Scenario: Is the average wage in urban areas different from $100,000\text{\$100,000}$100,000?

Test Statistic and Distribution

Decision Rule and Decision

Hypothesis Testing: Population Mean (Lower-tail Test)

Scenario: Is the average wage in regional areas less than $100,000\text{\$100,000}$100,000?

Test Statistic calculation

Decision Rule and Decision

Scenario: Is the average wage in urban areas different from $\text{\$100,000}$ ?

Scenario: Is the average wage in regional areas less than $\text{\$100,000}$ ?