9. Inference about 2 Populaton Means

Types of Questions Addressed

  • Comparative Analysis:
    • Incomes: Are incomes in Ontario comparable to those in Quebec?
    • Loneliness: Who is lonelier, Gen Z or older Canadians?
    • Political Orientation: Does political orientation differ by gender in Canada?
  • Statistical Testing: 2-sample tests for differences between populations.
    • Core question: Are two population means equal or not?
    • Manual calculation: Only the t-statistic calculated by hand; all other calculations through Stata.
    • Setup tests and interpret results meaningfully.

Steps in Hypothesis Testing

  1. Assumption Check: Ensure the assumptions for the test are met.
  2. Choose Significance Level (๐›ผ): Commonly set to 0.05.
  3. State Hypotheses:
    • Null Hypothesis (H0) vs. Alternative Hypothesis (H1).
  4. Compute Test Statistic: Calculate the t-statistic using the formula:
    [ t = \frac{\bar{x}1 - \bar{x}2}{s.e.} ]
  5. Find the p-value: Determine the p-value associated with the test statistic.
  6. Formal Conclusion: Compare p-value with ฮฑ:
    • If ( p < ๐›ผ ): Reject the null hypothesis.
    • If ( p \geq ๐›ผ ): Do not reject the null hypothesis.
  7. Interpret Results: Provide a plain English interpretation of the conclusion.

2-Sample T-Test Assumptions

  1. Samples: Both samples must be Simple Random Samples (SRS) and sampled independently.
    • Not suitable for matched pairs (e.g., husband-wife).
  2. Variable Type: The compared variable must be continuous, and the grouping variable should be dichotomous (categorical).

2-Sample T-Test

Hypotheses

  • Null Hypothesis (H0): ( ๐1 = ๐2 ) or ( ๐1 - ๐2 = 0 ) (the two population means are equal).
  • Alternative Hypothesis (H1): ( ๐1 โ‰  ๐2 ) (the means are not equal).

Test Statistic Calculation

  • Formula: [ t = \frac{\bar{x}1 - \bar{x}2}{s.e.} ]
  • Distribution: The test follows the t-distribution. In large samples, critical t โ‰ˆ ยฑ1.999.
  • Decision Rule: If calculated t > |1.999|, p-value < ๐›ผ โ†’ reject null hypothesis.

Conclusion and Interpretation

  • If ( p < ๐›ผ ): Reject H0.
    • Interpretation: The two populations do differ significantly.
  • If ( p \geq ๐›ผ ): Do not reject H0.
    • Interpretation: There is no significant difference between the groups.

P-Value Calculation and Degrees of Freedom

  • Degrees of Freedom: Calculated using Satterthwaite's formula (complex, often requires software).
  • In practice, run tests in Stata due to sample size considerations:
    • Use normal cutpoints for large samples:
      • |1.64| for 90% CI (๐›ผ = 0.1)
      • |1.96| for 95% CI (๐›ผ = 0.05)
      • |2.58| for 99% CI (๐›ผ = 0.01)
  • Do not pool variances; use the 'unequal' option in Stata for valid results.

Interpretation of Results

  • Rejecting the Null: Indicates groups are significantly different, not likely due to chance.
  • Failing to Reject the Null: No significant difference observed; differences are expected under H0.

Relationship Between Confidence Level and ๐›ผ

  • The confidence interval (CI) level represents the probability that the interval contains the population parameter.
  • ๐›ผ (alpha) represents the probability that the interval does not contain the parameter: ( ๐›ผ = 1 - (\text{CI level}) ).
  • These values are complementary.

Sampling Distributions and Hypothesis Tests

  • For hypothesis tests, under the null, assume the samples come from populations with the same mean (H0 states ๐1 = ๐2).
  • The test assesses whether the observed mean differences are unusual compared to this assumption.

Additional Information on Matched Pairs

  • Matched pairs refer to observations that are naturally paired (e.g., pre-test and post-test).
  • The approach here involves calculating differences for each pair and applying a one-sample t-test on these differences:
    • Null Hypothesis: ๐ = 0 (the mean difference is zero).

Stata Commands for T-tests

  • To perform a 2-sample t-test:
    ttest variable, by(group_var) unequal
  • Where "variable" is continuous and "group_var" identifies group categories (e.g., gender).
  • You can also calculate using summary statistics:
    ttesti n1 ฬ…๐‘ฅ1 s1 n2 ฬ…๐‘ฅ2 s2
  • Both commands will automatically display descriptive statistics and confidence intervals for the differences.