Hypothesis Testing: Two Samples - Part 1 Study Notes
Introduction to Two-Sample Hypothesis Testing
Extension from One Sample: Previously, we focused on hypothesis testing for a single sample from one population. Now, the discussion extends to cases with two samples, aiming to understand the difference between two populations.
Consistency of Logic: All subsequent hypothesis testing techniques, including two-sample tests, are built upon the same fundamental logic established for one-sample tests.
Two Versions: There will be two different versions of the two-sample hypothesis test.
Example: The Gender Pay Gap in the United States
Definition: The gender pay gap is the ratio of female to male median yearly earnings among full-time, year-round workers.
Unadjusted Salary Comparison (2015 Data):- The average woman's unadjusted annual salary was cited as to of that of the average man's.
This implies women earned between and cents for every dollar earned by a man.
Relevance: This is a direct comparison of mean salaries, which will be analyzed using a t-test.
Adjusted Salary Comparison:- This comparison accounts for factors like college major, occupation, working hours, and parental leave choices made by male and female workers.
Multiple studies found that after adjusting for these factors, pay rates varied by or, more commonly, females earned cents for every dollar earned by their male counterparts.
Methodology: Techniques for such comparisons with control variables are covered in future lectures on multiple regression.
Fairer Comparison: This adjusted comparison is considered much fairer as it controls for many confounding factors.
Remaining Gap: The remaining of the gap (the difference between cents and cents) has been speculated to originate from two primary sources:1. Deficiency in salary negotiation skills.
Gender discrimination.
Negotiation Skills Insight: Research, such as Linda Babcock's book "Women Don't Ask," suggests women often do not receive raises because they do not ask for them. This highlights a practical implication for women to actively seek raises they believe they deserve.
Comparing Mean Salaries: The Statistical Approach
Objective: Test whether the observed differences between two sample means are statistically significant or merely due to random sampling error.
Data Collection: Drawing two separate samples, e.g., a sample of men's salaries and a sample of women's salaries, from their respective populations.
Sample Means: Compute sample averages ($ar{x}{men}$ and $ar{x}{women}$) and compare them.
The Problem of Sampling Error: Even if there were no real difference in population means, it's highly unlikely that two sample means would be exactly identical due to inherent sampling error.
Core Question: Are the observed sample mean differences 'significantly' different, or are they just 'noise' from random sampling error?
Null Hypothesis (for Two Samples):- The hypothesis is made about the population parameters (the $\mu$'s), not the sample statistics ($ar{x}$'s).
It assumes no real difference between the two population means: (or ).
Logic: Under the assumption of the null hypothesis (no real difference), we calculate how likely it is to observe the difference found between our two sample means.
Sampling Distribution of Mean Differences:- This requires a new sampling distribution, specifically the distribution of the difference between two sample means.
Theoretical Construction: Imagine a world where the null hypothesis is true ().1. Repeatedly select paired samples (one from population 1, one from population 2).
Compute the mean for each sample ($ar{x}1$ and $ar{x}2$).
Calculate the difference: .
Record this difference.
Repeat this process numerous times to build a distribution of these differences.
This distribution will show the probability of getting various differences between two samples purely by sampling error when there is no true population difference.
Decision Making: If the observed sample mean difference is highly unlikely under the null hypothesis (i.e., its probability is very low as indicated by the sampling distribution), we infer that the initial hypothesis (no difference) might be wrong, suggesting a significant difference exists.
Why Significance Testing?- Business or policy decisions often have significant costs and resource implications.
Significance tests ensure decisions are based on reliable evidence, not just superficial differences that could be due to chance.
It prevents allocating resources to solve a problem that might not truly exist.
Examples of Two-Sample Testing:- Comparing the mean number of patients admitted on Friday versus Monday at a hospital.
Evaluating the mean increase in sales for salespeople before and after a new training program.
Determining which of two manufacturing processes is more efficient by comparing the mean number of items manufactured.
The Sampling Distribution of Sample Mean Differences
Visual Representation: Similar in concept to the one-sample sampling distribution but specifically for the difference between two sample means.
One-Sample Sampling Distribution (Review):- Statistic: Sample mean ($ar{x}$).
Parameter: Population mean ($\mu$).
Mean of Distribution: (the average of sample means equals the population mean).
Standard Deviation: Called the standard error, denoted as .
Shape: Normal, due to the Central Limit Theorem (CLT).
Two-Sample Sampling Distribution:- Statistic: The difference between two sample means ($ar{x}1 - \bar{x}2$).
Parameter: The true difference between two population means ().
Null Hypothesis: In almost all two-sample comparison cases, we hypothesize . This centralizes the distribution around zero.
Mean of Distribution (under Null): .
Standard Deviation: Called the standard error of the mean difference, denoted as .
This is a weighted average of the standard deviations of the two samples.
The exact formula is complex but is handled by statistical software.
Shape: Normal, an extension of the Central Limit Theorem, which is beneficial for p-value calculation.
P-value Calculation: With the mean, standard deviation, and normal shape, we have sufficient information to calculate p-values, indicating the probability of observing our sample difference if the true population difference were zero.
Paired vs. Independent Samples: A Critical Distinction
Importance: Correctly identifying sample type (paired or independent) is crucial for valid test results.
Dependent (Paired) Samples:- Definition: Measurements are paired for one set of items, meaning there's a direct connection between an observation in one sample and an observation in the other.
Common Element: Often involves a 'before and after' time element (e.g., productivity before and after training).
Other Connections: Can also be connected in other ways, such as comparing the performance of the exact same team using two different methods.
Key Question: Does each data point in one sample have a specific, corresponding data point in the other sample that it is logically linked to?
Independent Samples:- Definition: Measurements in one sample are unrelated to measurements in the other sample.
No Direct Link: There is no natural or logical pairing between observations across the two groups.
Example: Comparing the salaries of men and women where individuals are randomly selected from each population without any direct pairing.
Key Characteristic: Different individuals or entities are observed in each group.
Why the Distinction Matters:- The statistical tests and formulas used for paired samples are different from those used for independent samples.
Paired Samples: Utilize a 'difference score' approach, focusing on the variation within pairs, which typically reduces variability and increases statistical power.
Independent Samples: Compare the means of two entirely separate groups, and the variability of each group contributes to the overall sampling error independently.
Using the wrong test can lead to incorrect conclusions regarding statistical significance.
Vocabulary to Know
Gender Pay Gap: The ratio of female to male median yearly earnings among full-time, year-round workers.
Unadjusted Salary Comparison: A direct comparison of mean salaries without accounting for confounding factors like college major, occupation, or working hours.
Adjusted Salary Comparison: A comparison of salaries that accounts for factors such as college major, occupation, working hours, and parental leave choices to provide a fairer assessment.
Sampling Error: The inherent variability that causes two sample means to be unlikely to be exactly identical, even if there were no real difference in population means.
Null Hypothesis (for Two Samples): The hypothesis that assumes no real difference between two population means; generally stated as . It is made about population parameters, not sample statistics.
Sampling Distribution of Mean Differences: A theoretical distribution of the differences between two sample means, constructed by repeatedly selecting paired samples and computing their differences. It helps determine the probability of observing a sample difference purely by sampling error when there is no true population difference.
Standard Error of the Mean Difference: The standard deviation of the sampling distribution of the difference between two sample means, denoted as .
Paired (Dependent) Samples: Measurements where there is a direct, logical connection between an observation in one sample and an observation in the other, often involving 'before and after' scenarios or comparisons of the same entities under different conditions.
Independent Samples: Measurements in one sample are unrelated to measurements in the other sample, meaning there is no natural or logical pairing between observations across the two groups.