Two Sample T-Tests and Power

Central Limit Theorem

  • Conditions:
    • Random and independently selected sample.
    • Large sample size (more than 30).
    • Sample size less than 10% of the population.
  • Theorem: Distribution of the sample mean (xˉ)(\bar{x}) is approximately Normal.
    • Mean equals the population mean μ\mu.
    • Standard deviation equals σn\frac{\sigma}{\sqrt{n}}, where σ\sigma is the population standard deviation and nn is the sample size.
  • Sample mean is distributed as N(μ,σn)N(\mu, \frac{\sigma}{\sqrt{n}}).

Two Sample T-Tests

  • Used to compare two groups to see if their means are different.

Paired vs. Unpaired Data

  • Paired Data
    • Observations in each dataset are paired.
    • Samples have the same size.
    • Examples:
      • Comparing student performance from midterm to final.
      • Comparing textbook prices at the bookstore and on Amazon.
  • Unpaired Data
    • Observations in each dataset don’t have a natural pairing.
    • Sample sizes don’t need to be the same.
    • Examples:
      • Comparing student scores from Fall and Spring semesters.
      • Comparing overall average prices on Amazon and at the university bookstore.

Inference for Paired Data

  • Take the difference of paired values and treat that as one sample.
  • Run a one-sample t-test on the differences.

Hypothesis Testing for Two Sample Means

  • Steps are similar to one-sample tests.
    • Formulate hypotheses.
    • Prepare (check conditions, set alpha level).
    • Calculate the t-statistic.
    • Calculate the p-value and interpret results.
Hypotheses
  • Null Hypothesis: H<em>0:μ</em>1=μ<em>2H<em>0: \mu</em>1 = \mu<em>2 or H</em>0:μ<em>1μ</em>2=0H</em>0: \mu<em>1 - \mu</em>2 = 0
  • Alternative Hypothesis: H<em>A:μ</em>1μ<em>2H<em>A: \mu</em>1 \neq \mu<em>2 or H</em>A:μ<em>1μ</em>20H</em>A: \mu<em>1 - \mu</em>2 \neq 0
Calculating the Test Statistic
  • Point estimate: Difference in sample means (xˉ<em>1xˉ</em>2)( \bar{x}<em>1 - \bar{x}</em>2 ).
  • Degrees of freedom: Smaller of (n<em>11)(n<em>1 - 1) and (n</em>21)(n</em>2 - 1).
  • Standard error: SE=s<em>12n</em>1+s<em>22n</em>2SE = \sqrt{\frac{s<em>1^2}{n</em>1} + \frac{s<em>2^2}{n</em>2}}, where s<em>1s<em>1 and s</em>2s</em>2 are the sample standard deviations, and n<em>1n<em>1 and n</em>2n</em>2 are the sample sizes.
  • Test statistic: t=xˉ<em>1xˉ</em>2SEt = \frac{\bar{x}<em>1 - \bar{x}</em>2}{SE}
Example: Song Length by Genre
  • Question: Are folk songs on average longer than classic pop and rock songs?
  • Data:
    • 226 classic pop and rock songs: mean duration = 222.45 seconds, standard deviation = 91.37 seconds.
    • 122 folk songs: mean duration = 232.2 seconds, standard deviation = 73.33 seconds.
  • Steps:
    • Formulate Hypotheses.
    • Check Conditions (random sample, less than 10% of songs).
    • Compute Test Statistic.

Confidence Intervals

  • Formula: Xˉ<em>1Xˉ</em>2±t<em>dfs</em>12n<em>1+s</em>22n2\bar{X}<em>1 - \bar{X}</em>2 \pm t<em>{df} * \sqrt{\frac{s</em>1^2}{n<em>1} + \frac{s</em>2^2}{n_2}}
Example: Find a 95% confidence interval for the difference in song lengths.
  • Conditions are met.
  • Calculate the interval.
  • Interpretation: We are 95% confident that the difference in song length between folk songs and classic pop and rock songs is between -8.07 and 27.57.

Power

  • Framework to test if there is a difference between sample means.

Illustrative Example: Pharmaceutical Company Testing New Drug

  • Pharmaceutical company develops a new drug for lowering blood pressure.
  • They conduct a clinical trial.
    • Recruit people taking a standard blood pressure medication.
    • Control group continues current medication (with generic-looking pills).
  • Researchers want to run the trial on patients with systolic blood pressures between 140 and 180 mmHg.
  • Previous studies suggest:
    • Standard deviation of patients’ blood pressures will be about 12 mmHg.
    • The distribution of patient blood pressures will be approximately symmetric.
  • If we had 100 patients per group, the approximate standard error would be: SE=122100+122100=1.70SE = \sqrt{\frac{12^2}{100} + \frac{12^2}{100}} = 1.70.

Detecting a Difference

  • Determine values of xˉ<em>treatmentxˉ</em>control\bar{x}<em>{treatment} - \bar{x}</em>{control} that would lead to rejecting the null hypothesis.
  • Assume α=0.05\alpha = 0.05 (two-sided test).
  • Reject if the difference is in the lower 2.5% or upper 2.5%.
  • Assuming a Normal distribution, any difference below 1.961.70=3.332-1.96 * 1.70 = -3.332 or above 1.961.70=3.3321.96 * 1.70 = 3.332 would be in the rejection region.
  • Suppose the new drug reduces blood pressure by 3 mmHg relative to the standard medication.
Finding the Probability of Detecting a Difference
  • Called the "power of a test."
  • Depends on the size of a difference we want to detect, sample size, and standard deviation.
  • Effect size is the difference we are looking for.

Connection with Type II Error

  • Type I error: Reject the null hypothesis when it’s actually true.
  • Type II error: Fail to reject the null hypothesis when it’s not true.
  • α\alpha = probability of making Type I error.
  • β\beta = probability of making a Type II error.
  • Power of a test = 1β1 - \beta
  • We can set the probability of making a Type I error using the alpha level.
  • We have less control over the probability of making a Type II error, but we can measure it and account for it using the power.

Using Power Calculations

  • Determine the power of a test to find a sample size that gives enough power to detect a minimum effect size.
Power in Blood Pressure Medication Test
  • With a sample size of 100 in each group to detect an effect size of 3 mmHg, the power of the test was 0.42.
  • Increase sample size to be able to detect it.
Finding Sample Sizes for Blood Pressure Medication
  • Find the sample size that gives a power of 80% with an effect size of 3 mmHg.
  • Find the z-score in the true sampling distribution that gives us 80% below.
    We need a standard error such that a z-score of 0.84 in the true sampling distribution is the same as a z-score of -1.96 in the null sampling distribution