Theorem: Distribution of the sample mean (xˉ) is approximately Normal.
Mean equals the population mean μ.
Standard deviation equals nσ, where σ is the population standard deviation and n is the sample size.
Sample mean is distributed as N(μ,nσ).
Two Sample T-Tests
Used to compare two groups to see if their means are different.
Paired vs. Unpaired Data
Paired Data
Observations in each dataset are paired.
Samples have the same size.
Examples:
Comparing student performance from midterm to final.
Comparing textbook prices at the bookstore and on Amazon.
Unpaired Data
Observations in each dataset don’t have a natural pairing.
Sample sizes don’t need to be the same.
Examples:
Comparing student scores from Fall and Spring semesters.
Comparing overall average prices on Amazon and at the university bookstore.
Inference for Paired Data
Take the difference of paired values and treat that as one sample.
Run a one-sample t-test on the differences.
Hypothesis Testing for Two Sample Means
Steps are similar to one-sample tests.
Formulate hypotheses.
Prepare (check conditions, set alpha level).
Calculate the t-statistic.
Calculate the p-value and interpret results.
Hypotheses
Null Hypothesis: H<em>0:μ</em>1=μ<em>2 or H</em>0:μ<em>1−μ</em>2=0
Alternative Hypothesis: H<em>A:μ</em>1=μ<em>2 or H</em>A:μ<em>1−μ</em>2=0
Calculating the Test Statistic
Point estimate: Difference in sample means (xˉ<em>1−xˉ</em>2).
Degrees of freedom: Smaller of (n<em>1−1) and (n</em>2−1).
Standard error: SE=n</em>1s<em>12+n</em>2s<em>22, where s<em>1 and s</em>2 are the sample standard deviations, and n<em>1 and n</em>2 are the sample sizes.
Test statistic: t=SExˉ<em>1−xˉ</em>2
Example: Song Length by Genre
Question: Are folk songs on average longer than classic pop and rock songs?
Data:
226 classic pop and rock songs: mean duration = 222.45 seconds, standard deviation = 91.37 seconds.
122 folk songs: mean duration = 232.2 seconds, standard deviation = 73.33 seconds.
Steps:
Formulate Hypotheses.
Check Conditions (random sample, less than 10% of songs).
Example: Find a 95% confidence interval for the difference in song lengths.
Conditions are met.
Calculate the interval.
Interpretation: We are 95% confident that the difference in song length between folk songs and classic pop and rock songs is between -8.07 and 27.57.
Power
Framework to test if there is a difference between sample means.
Illustrative Example: Pharmaceutical Company Testing New Drug
Pharmaceutical company develops a new drug for lowering blood pressure.
They conduct a clinical trial.
Recruit people taking a standard blood pressure medication.
Control group continues current medication (with generic-looking pills).
Researchers want to run the trial on patients with systolic blood pressures between 140 and 180 mmHg.
Previous studies suggest:
Standard deviation of patients’ blood pressures will be about 12 mmHg.
The distribution of patient blood pressures will be approximately symmetric.
If we had 100 patients per group, the approximate standard error would be: SE=100122+100122=1.70.
Detecting a Difference
Determine values of xˉ<em>treatment−xˉ</em>control that would lead to rejecting the null hypothesis.
Assume α=0.05 (two-sided test).
Reject if the difference is in the lower 2.5% or upper 2.5%.
Assuming a Normal distribution, any difference below −1.96∗1.70=−3.332 or above 1.96∗1.70=3.332 would be in the rejection region.
Suppose the new drug reduces blood pressure by 3 mmHg relative to the standard medication.
Finding the Probability of Detecting a Difference
Called the "power of a test."
Depends on the size of a difference we want to detect, sample size, and standard deviation.
Effect size is the difference we are looking for.
Connection with Type II Error
Type I error: Reject the null hypothesis when it’s actually true.
Type II error: Fail to reject the null hypothesis when it’s not true.
α = probability of making Type I error.
β = probability of making a Type II error.
Power of a test = 1−β
We can set the probability of making a Type I error using the alpha level.
We have less control over the probability of making a Type II error, but we can measure it and account for it using the power.
Using Power Calculations
Determine the power of a test to find a sample size that gives enough power to detect a minimum effect size.
Power in Blood Pressure Medication Test
With a sample size of 100 in each group to detect an effect size of 3 mmHg, the power of the test was 0.42.
Increase sample size to be able to detect it.
Finding Sample Sizes for Blood Pressure Medication
Find the sample size that gives a power of 80% with an effect size of 3 mmHg.
Find the z-score in the true sampling distribution that gives us 80% below.
We need a standard error such that a z-score of 0.84 in the true sampling distribution is the same as a z-score of -1.96 in the null sampling distribution