Two Sample T-Tests and Power
Central Limit Theorem
- Conditions:
- Random and independently selected sample.
- Large sample size (more than 30).
- Sample size less than 10% of the population.
- Theorem: Distribution of the sample mean (\bar{x}) is approximately Normal.
- Mean equals the population mean \mu.
- Standard deviation equals \frac{\sigma}{\sqrt{n}}, where \sigma is the population standard deviation and n is the sample size.
- Sample mean is distributed as N(\mu, \frac{\sigma}{\sqrt{n}}).
Two Sample T-Tests
- Used to compare two groups to see if their means are different.
Paired vs. Unpaired Data
- Paired Data
- Observations in each dataset are paired.
- Samples have the same size.
- Examples:
- Comparing student performance from midterm to final.
- Comparing textbook prices at the bookstore and on Amazon.
- Unpaired Data
- Observations in each dataset don’t have a natural pairing.
- Sample sizes don’t need to be the same.
- Examples:
- Comparing student scores from Fall and Spring semesters.
- Comparing overall average prices on Amazon and at the university bookstore.
Inference for Paired Data
- Take the difference of paired values and treat that as one sample.
- Run a one-sample t-test on the differences.
Hypothesis Testing for Two Sample Means
- Steps are similar to one-sample tests.
- Formulate hypotheses.
- Prepare (check conditions, set alpha level).
- Calculate the t-statistic.
- Calculate the p-value and interpret results.
Hypotheses
- Null Hypothesis: H0: \mu1 = \mu2 or H0: \mu1 - \mu2 = 0
- Alternative Hypothesis: HA: \mu1 \neq \mu2 or HA: \mu1 - \mu2 \neq 0
Calculating the Test Statistic
- Point estimate: Difference in sample means ( \bar{x}1 - \bar{x}2 ).
- Degrees of freedom: Smaller of (n1 - 1) and (n2 - 1).
- Standard error: SE = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}, where s1 and s2 are the sample standard deviations, and n1 and n2 are the sample sizes.
- Test statistic: t = \frac{\bar{x}1 - \bar{x}2}{SE}
Example: Song Length by Genre
- Question: Are folk songs on average longer than classic pop and rock songs?
- Data:
- 226 classic pop and rock songs: mean duration = 222.45 seconds, standard deviation = 91.37 seconds.
- 122 folk songs: mean duration = 232.2 seconds, standard deviation = 73.33 seconds.
- Steps:
- Formulate Hypotheses.
- Check Conditions (random sample, less than 10% of songs).
- Compute Test Statistic.
Confidence Intervals
- Formula: \bar{X}1 - \bar{X}2 \pm t{df} * \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}
Example: Find a 95% confidence interval for the difference in song lengths.
- Conditions are met.
- Calculate the interval.
- Interpretation: We are 95% confident that the difference in song length between folk songs and classic pop and rock songs is between -8.07 and 27.57.
Power
- Framework to test if there is a difference between sample means.
Illustrative Example: Pharmaceutical Company Testing New Drug
- Pharmaceutical company develops a new drug for lowering blood pressure.
- They conduct a clinical trial.
- Recruit people taking a standard blood pressure medication.
- Control group continues current medication (with generic-looking pills).
- Researchers want to run the trial on patients with systolic blood pressures between 140 and 180 mmHg.
- Previous studies suggest:
- Standard deviation of patients’ blood pressures will be about 12 mmHg.
- The distribution of patient blood pressures will be approximately symmetric.
- If we had 100 patients per group, the approximate standard error would be: SE = \sqrt{\frac{12^2}{100} + \frac{12^2}{100}} = 1.70.
Detecting a Difference
- Determine values of \bar{x}{treatment} - \bar{x}{control} that would lead to rejecting the null hypothesis.
- Assume \alpha = 0.05 (two-sided test).
- Reject if the difference is in the lower 2.5% or upper 2.5%.
- Assuming a Normal distribution, any difference below -1.96 * 1.70 = -3.332 or above 1.96 * 1.70 = 3.332 would be in the rejection region.
- Suppose the new drug reduces blood pressure by 3 mmHg relative to the standard medication.
Finding the Probability of Detecting a Difference
- Called the "power of a test."
- Depends on the size of a difference we want to detect, sample size, and standard deviation.
- Effect size is the difference we are looking for.
Connection with Type II Error
- Type I error: Reject the null hypothesis when it’s actually true.
- Type II error: Fail to reject the null hypothesis when it’s not true.
- \alpha = probability of making Type I error.
- \beta = probability of making a Type II error.
- Power of a test = 1 - \beta
- We can set the probability of making a Type I error using the alpha level.
- We have less control over the probability of making a Type II error, but we can measure it and account for it using the power.
Using Power Calculations
- Determine the power of a test to find a sample size that gives enough power to detect a minimum effect size.
Power in Blood Pressure Medication Test
- With a sample size of 100 in each group to detect an effect size of 3 mmHg, the power of the test was 0.42.
- Increase sample size to be able to detect it.
Finding Sample Sizes for Blood Pressure Medication
- Find the sample size that gives a power of 80% with an effect size of 3 mmHg.
- Find the z-score in the true sampling distribution that gives us 80% below.
We need a standard error such that a z-score of 0.84 in the true sampling distribution is the same as a z-score of -1.96 in the null sampling distribution