K

Two Sample T-Tests and Power

Central Limit Theorem

  • Conditions:
    • Random and independently selected sample.
    • Large sample size (more than 30).
    • Sample size less than 10% of the population.
  • Theorem: Distribution of the sample mean (\bar{x}) is approximately Normal.
    • Mean equals the population mean \mu.
    • Standard deviation equals \frac{\sigma}{\sqrt{n}}, where \sigma is the population standard deviation and n is the sample size.
  • Sample mean is distributed as N(\mu, \frac{\sigma}{\sqrt{n}}).

Two Sample T-Tests

  • Used to compare two groups to see if their means are different.

Paired vs. Unpaired Data

  • Paired Data
    • Observations in each dataset are paired.
    • Samples have the same size.
    • Examples:
      • Comparing student performance from midterm to final.
      • Comparing textbook prices at the bookstore and on Amazon.
  • Unpaired Data
    • Observations in each dataset don’t have a natural pairing.
    • Sample sizes don’t need to be the same.
    • Examples:
      • Comparing student scores from Fall and Spring semesters.
      • Comparing overall average prices on Amazon and at the university bookstore.

Inference for Paired Data

  • Take the difference of paired values and treat that as one sample.
  • Run a one-sample t-test on the differences.

Hypothesis Testing for Two Sample Means

  • Steps are similar to one-sample tests.
    • Formulate hypotheses.
    • Prepare (check conditions, set alpha level).
    • Calculate the t-statistic.
    • Calculate the p-value and interpret results.

Hypotheses

  • Null Hypothesis: H0: \mu1 = \mu2 or H0: \mu1 - \mu2 = 0
  • Alternative Hypothesis: HA: \mu1 \neq \mu2 or HA: \mu1 - \mu2 \neq 0

Calculating the Test Statistic

  • Point estimate: Difference in sample means ( \bar{x}1 - \bar{x}2 ).
  • Degrees of freedom: Smaller of (n1 - 1) and (n2 - 1).
  • Standard error: SE = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}, where s1 and s2 are the sample standard deviations, and n1 and n2 are the sample sizes.
  • Test statistic: t = \frac{\bar{x}1 - \bar{x}2}{SE}

Example: Song Length by Genre

  • Question: Are folk songs on average longer than classic pop and rock songs?
  • Data:
    • 226 classic pop and rock songs: mean duration = 222.45 seconds, standard deviation = 91.37 seconds.
    • 122 folk songs: mean duration = 232.2 seconds, standard deviation = 73.33 seconds.
  • Steps:
    • Formulate Hypotheses.
    • Check Conditions (random sample, less than 10% of songs).
    • Compute Test Statistic.

Confidence Intervals

  • Formula: \bar{X}1 - \bar{X}2 \pm t{df} * \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}

Example: Find a 95% confidence interval for the difference in song lengths.

  • Conditions are met.
  • Calculate the interval.
  • Interpretation: We are 95% confident that the difference in song length between folk songs and classic pop and rock songs is between -8.07 and 27.57.

Power

  • Framework to test if there is a difference between sample means.

Illustrative Example: Pharmaceutical Company Testing New Drug

  • Pharmaceutical company develops a new drug for lowering blood pressure.
  • They conduct a clinical trial.
    • Recruit people taking a standard blood pressure medication.
    • Control group continues current medication (with generic-looking pills).
  • Researchers want to run the trial on patients with systolic blood pressures between 140 and 180 mmHg.
  • Previous studies suggest:
    • Standard deviation of patients’ blood pressures will be about 12 mmHg.
    • The distribution of patient blood pressures will be approximately symmetric.
  • If we had 100 patients per group, the approximate standard error would be: SE = \sqrt{\frac{12^2}{100} + \frac{12^2}{100}} = 1.70.

Detecting a Difference

  • Determine values of \bar{x}{treatment} - \bar{x}{control} that would lead to rejecting the null hypothesis.
  • Assume \alpha = 0.05 (two-sided test).
  • Reject if the difference is in the lower 2.5% or upper 2.5%.
  • Assuming a Normal distribution, any difference below -1.96 * 1.70 = -3.332 or above 1.96 * 1.70 = 3.332 would be in the rejection region.
  • Suppose the new drug reduces blood pressure by 3 mmHg relative to the standard medication.

Finding the Probability of Detecting a Difference

  • Called the "power of a test."
  • Depends on the size of a difference we want to detect, sample size, and standard deviation.
  • Effect size is the difference we are looking for.

Connection with Type II Error

  • Type I error: Reject the null hypothesis when it’s actually true.
  • Type II error: Fail to reject the null hypothesis when it’s not true.
  • \alpha = probability of making Type I error.
  • \beta = probability of making a Type II error.
  • Power of a test = 1 - \beta
  • We can set the probability of making a Type I error using the alpha level.
  • We have less control over the probability of making a Type II error, but we can measure it and account for it using the power.

Using Power Calculations

  • Determine the power of a test to find a sample size that gives enough power to detect a minimum effect size.

Power in Blood Pressure Medication Test

  • With a sample size of 100 in each group to detect an effect size of 3 mmHg, the power of the test was 0.42.
  • Increase sample size to be able to detect it.
Finding Sample Sizes for Blood Pressure Medication
  • Find the sample size that gives a power of 80% with an effect size of 3 mmHg.
  • Find the z-score in the true sampling distribution that gives us 80% below.
    We need a standard error such that a z-score of 0.84 in the true sampling distribution is the same as a z-score of -1.96 in the null sampling distribution