Inference for Two Independent Means: Comprehensive Study Notes

Overview of Two-Population Inference for Means

  • The study of two-population inference (Section 12.1) focuses on comparing a single quantitative variable across two distinct categories.
  • This transition moves away from comparing a sample mean (xˉ\bar{x}) to a specific numerical parameter (e.g., comparing a mean to exactly 3030 minutes) to comparing the means of two independent groups against each other.
  • Variable Structure: Each analysis involves one categorical variable (defining the groups) and one quantitative variable (the measurement being compared).
        - Example: Comparing the Essex campus to the Dundalk campus. The categorical variable is the "Campus Type" and the quantitative variable is the "Commute Time."
        - Example: Comparing GPA by gender. The categorical variable is "Gender" and the quantitative variable is "GPA."
  • Application Examples:
        - Effectiveness of a placebo versus a medical treatment.
        - Popularity levels of two different political candidates.
        - Weight times in teller services comparing a single line versus individual lines.
        - Longevity of different battery brands.
        - Cholesterol levels in patients on medication versus traditional treatment.
  • Note on Proportions: While a difference in proportions is a standard statistical topic, it was removed from this semester's curriculum ("on the chopping block") to prioritize means.

Hypotheses and Notation for Two Means

  • Notation:
        - Population Means: μ1\mu_1 and μ2\mu_2.
        - Sample Means: xˉ1\bar{x}_1 and xˉ2\bar{x}_2.
        - Sample Standard Deviations: s1s_1 and s2s_2.
        - Sample Sizes: n1n_1 and n2n_2.
        - Subscripts are essential to differentiate between the two distinct populations.
  • Null Hypothesis (H0H_0):
        - The null hypothesis always assumes no difference between the two populations: H0:μ1=μ2H_0: \mu_1 = \mu_2.
        - It can also be expressed as H0:μ1μ2=0H_0: \mu_1 - \mu_2 = 0. This version is helpful for understanding software inputs in tools like R-Guru, where the value compared is zero.
  • Alternative Hypothesis (HaH_a):
        - This identifies the nature of the suspected difference:
            - Two-tailed (Difference): Ha:μ1μ2H_a: \mu_1 \neq \mu_2.
            - Right-tailed (Greater than): H_a: \mu_1 > \mu_2.
            - Left-tailed (Less than): H_a: \mu_1 < \mu_2.
  • Importance of Order: The direction of the inequality depends on which group is assigned as population 1 and population 2. If groups are switched, the inequality must be reversed to maintain the logical claim.

Visualization and Box Plots

  • Box plots are an effective visual tool for comparing a quantitative variable across categories.
  • They allow for a quick assessment of the center (median) and the spread (interquartile range) of data.
  • Interpretation:
        - In a teller service wait-time example, a single-line median of 4.54.5 minutes is compared to an individual-line median of 66 minutes.
        - If box plots show significant overlap, a formal hypothesis test is required to determine if the 1.51.5-minute difference is statistically significant.
        - If the boxes were entirely separated along the y-axis, the difference would be visually obvious and likely significant without further testing.

Conditions for Hypothesis Testing

  • Before conducting a t-test for two means, the following conditions must be met for both samples:
        - Sample Size: Both sample sizes (n1n_1 and n2n_2) must be greater than or equal to 3030 (n30n \ge 30).
        - Normality: If the sample size is less than 3030, the population must be approximately normal (bell-shaped curve) with no heavy skews or extreme outliers.
  • In many textbook scenarios where sample sizes are small, practitioners must explicitly state the assumption that the data is normally distributed.

The Test Statistic and R-Guru Procedure

  • Test Statistic Formula:
        - The difference in sample means is divided by the standard error of the difference:
        - t=(xˉ1xˉ2)s12n1+s22n2t = \frac{(\bar{x}_1 - \bar{x}_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
  • R-Guru Workflow:
        1. Navigate to Analysis -> Mean Inference -> Two Population.
        2. Under the Summary tab, enter labels and data for both factors:
            - Factor 1: xˉ1\bar{x}_1, s1s_1, and n1n_1.
            - Factor 2: xˉ2\bar{x}_2, s2s_2, and n2n_2.
        3. Under the Population 1 & 2 tab:
            - Select Test for Hypothesis.
            - Set the difference value to 00.
            - Select the appropriate inequality (<<, >>, or \neq).
            - Select the t-statistic.
            - Set the significance level (α\alpha). Default is 0.050.05 (5%) if not specified.
  • Decision Rules (Same as Chapter 11):
        - If p-value α\le \alpha: Reject the null hypothesis (H0H_0).
        - If p-value > \alpha: Fail to reject the null hypothesis (H0H_0).

Case Study 1: Senior vs. Freshman Study Habits

  • Context: A study compares the average time spent studying per week by seniors versus freshmen.
  • Hypotheses:
        - H0:μ1=μ2H_0: \mu_1 = \mu_2 (No difference).
        - H_a: \mu_1 > \mu_2 (Seniors study more than freshmen).
  • Data Provided:
        - Seniors (Group 1): xˉ1=15.6\bar{x}_1 = 15.6 hours, s1=3.9s_1 = 3.9, n1=60n_1 = 60.
        - Freshmen (Group 2): xˉ2=13.7\bar{x}_2 = 13.7 hours, s2=4.8s_2 = 4.8, n2=75n_2 = 75.
  • Analysis:
        - Level of significance (α\alpha): 0.050.05.
        - p-value: 0.0060.006.
  • Conclusion: Since 0.006 < 0.05, reject the null hypothesis. There is sufficient evidence to suggest that seniors, on average, study more than freshmen.

Case Study 2: Cordless Phone Range Comparison

  • Context: Comparing the long-distance range of two cordless phone brands.
  • Data Provided:
        - Phone 1: xˉ1=1390\bar{x}_1 = 1390 units, s1=36s_1 = 36, n1=5n_1 = 5.
        - Phone 2: xˉ2=1340\bar{x}_2 = 1340 units, s2=33s_2 = 33, n2=11n_2 = 11.
  • Conditions: Since sample sizes are small (n < 30), we must assume the data is normally distributed.
  • Analysis:
        - Null Hypothesis (H0H_0): μ1=μ2\mu_1 = \mu_2.
        - Alternative Hypothesis (HaH_a): \mu_1 > \mu_2 (Claim: Phone 1 is better than Phone 2).
        - Significance Level (α\alpha): 0.010.01 (1%).
        - Resulting p-value was found to be greater than 0.010.01.
  • Conclusion: Fail to reject the null hypothesis. At the 1%1\% significance level, there is not enough evidence to suggest Phone 1 has a longer average range than Phone 2, even though a 5%5\% or 10%10\% level might have yielded a different result.

Case Study 3: Cholesterol Medication vs. Placebo

  • Context: Testing if a drug has a greater decrease in cholesterol compared to a placebo.
  • Data Provided:
        - Drug (Group 1): xˉ1=22.9\bar{x}_1 = 22.9, s1=4.4s_1 = 4.4, n1=49n_1 = 49.
        - Placebo (Group 2): xˉ2=20.9\bar{x}_2 = 20.9, s2=20.5s_2 = 20.5, n2=35n_2 = 35.
  • Analysis:
        - Hypotheses: H0:μ1=μ2H_0: \mu_1 = \mu_2; H_a: \mu_1 > \mu_2.
        - Significance Level (α\alpha): 0.050.05.
        - Test Statistic (t): 2.642.64.
        - p-value: 0.0050.005.
  • Conclusion: Since 0.005 < 0.05, reject the null hypothesis. There is enough evidence to suggest the drug results in a greater average decrease in cholesterol compared to the placebo.

Questions & Discussion

  • Student Inquiry on R-Guru Settings: A student asked about the "greater than" inequality setting in R-Guru regarding comparing the means in the senior/freshman study.
  • Response: The instructor confirmed that the setting should show the difference (μ1μ2\mu_1 - \mu_2) is greater than zero, which aligns with the hypothesis that the first group is larger than the second.
  • Student Inquiry on Variances: A student asked about assuming population variances are equal for the cordless phone problem.
  • Response: The instructor noted that while the book may suggest this, in this course, standard deviations are typically treated as unknown and variances are not assumed to be equal for these specific t-tests.
  • Student Inquiry on p-value meaning: A student asked for clarification on what the p-value means in context.
  • Response: The instructor explained that it determines whether the data provides enough evidence to reject the null hypothesis based on its size relative to α\alpha. Small p-values indicate strong evidence against the null.