Two-Sample Tests of Hypothesis Study Notes Handout
Introduction to Two-Sample Hypothesis Testing
Conceptual Overview: In prior studies of hypothesis testing, researchers typically compared a single sample of data against a known or assumed population standard (a single numerical value). Two-sample hypothesis testing expands this concept by comparing two separate populations.
General Objective: The goal is to determine if a measurable and significant difference exists between the average values (μ1 and μ2) of two groups.
Methodology: Selection of two independent random samples, one from each population, to identify if differences are statistically significant or merely due to random chance.
Real-World Applications:
* Comparing real estate sales prices between male and female agents.
* Comparing average call counts between morning and afternoon customer service shifts.
* Determining if different supermarket checkout procedures result in different average times.
Independent Samples: Known Population Standard Deviations
Core Condition: This test is applied when samples are selected independently from two populations, both populations are assumed to follow a normal distribution, and the population standard deviations (σ1 and σ2) are known.
The Test Statistic (z): The z-distribution is used. The formula for the test statistic is:
z=n1σ12+n2σ22xˉ1−xˉ2
Example 1: Supermarket Checkout Methods
Scenario: FoodTown supermarket compares "Standard" cashier-assisted checkout (S) with "Fast Lane" self-checkout (F). The goal is to see if the standard method takes longer.
Step 1: Hypotheses:
* Null Hypothesis (H0): μS≤μF (Standard is not slower than Fast Lane).
* Alternate Hypothesis (H1): μS>μF (Standard is slower than Fast Lane).
Step 2: Significance Level: α=0.01.
Step 3: Test Statistic: z distribution is used because distributions are assumed normal and σ is known.
Step 4: Decision Rule: Upper-tailed test. The critical z value is 2.326. Reject H0 if z>2.326.
Step 5 & 6: Result: The computed z value is 3.123. Because 3.123>2.326, the null hypothesis is rejected. The difference of 0.20 minutes is statistically significant.
P-value Reasoning: The p-value is the probability of obtaining a z-value larger than 3.123 when the null is true.
Example 2: Sales Associates Gender Comparison
Scenario: Tom Sevits (Appliance Patch) wants to know if men sell less than women on average.
Decision: Computed z=20 (absolute value). Since ∣20∣>1.96, reject H0. There is a significant difference.
Independent Samples: Unknown Population Standard Deviations (Equal Variances)
Core Condition: Population standard deviations are unknown, but it is assumed that σ1=σ2. In this scenario, we use the t distribution and a "pooled" estimate of the variance.
Pooled Sample Variance (sp2): A weighted mean of the two sample standard deviations, where weights are the degrees of freedom provided by each sample.
Requirements and Assumptions
Sampled populations are approximately normally distributed.
Sampled populations are independent.
Standard deviations of the two populations are equal (σ1=σ2).
Pooled Test Formulas
Degrees of Freedom: df=n1+n2−2
Pooled Variance Equation:
sp2=n1+n2−2(n1−1)s12+(n2−1)s22
Test Statistic (t):
t=sp2(n11+n21)xˉ1−xˉ2
Example 5: Owens Lawn Care Mounting Procedures
Scenario: Comparing Welles method (W) vs. Atkins method (A) for engine mounting speed.
Decision: Computed t=1.67. Since 1.67<1.711, fail to reject H0. Data does not prove software issues take longer.
Hypothesis Testing for Dependent Samples (Paired Observations)
Core Condition: Samples are not independent; they are related or matched. This is often termed a "paired sample."
Mechanism: Instead of comparing two group means directly, we analyze the distribution of the differences (d) between each pair of observations. This reduces the problem to a one-sample test of the differences.
Notation: μd represents the population mean of the distribution of differences.
Assumption: The distribution of population differences is approximately normal.
Test Statistic for Paired Samples
t=sd/ndˉ
Variables:
* n: The number of paired observations.
* df: n−1.
* dˉ: The mean of the differences between paired observations.
* sd: The standard deviation of the differences.
Example 8: Real Estate Appraisal Consistency
Scenario: Nickel Savings and Loan uses two firms (Schadek and Bowyer) to appraise the same 10 homes to check for consistency.
Data: n=10,dˉ=4.6,sd=4.402.
Hypotheses: H0:μd=0,H1:μd=0 (Two-tailed).
Parameters: α=0.05,df=9. Critical t=±2.262.
Result: Computed t=4.402/104.6≈3.30. Since 3.30>2.262, the null hypothesis is rejected. There is a significant difference between the two appraisal firms.
Practice Scenarios: Independent vs. Dependent Samples
Before and After Software: Measuring productivity scores for the same employees before and after a change. (Dependent / Paired).
Two Separate Schools: Comparing test scores of two unique groups of students from School A and School B. (Independent).
Diet Plan Weight Loss: Measuring weight of the same participants before and after a 3-month diet. (Dependent / Paired).
Two Fertilizers: Measuring yields from two separate groups of plants. (Independent).
Therapy Anxiety Levels: Measuring anxiety for the same group of patients before and after therapy. (Dependent / Paired).
New Technology Productivity: Comparing two different groups of employees after technology implementation. (Independent).
Multiple Choice Concept Check
Null Hypothesis in Two-Sample Mean Test: Generally H0:μ1=μ2.
Use of t-distribution: Used when the population variance (or standard deviation) is unknown.
Variance Assumption Importance: Knowing if variances are equal determines the choice of the test statistic (pooled vs. separate) and the calculation of degrees of freedom.
Formula Sheet Summary Reference
Independent known σ: z=n1σ12+n2σ22xˉ1−xˉ2