Hypothesis Testing: Independent and Paired Samples

Hypothesis Testing: Two Samples

This lecture, part of BNAN 562, focuses on hypothesis testing involving two samples, specifically the independent samples t-test and the paired t-test.

Independent Samples t-test

Introduction and Purpose
  • The independent samples t-test is used to determine if there is a significant difference between the means of two unrelated (independent) groups.
  • Example Question: Are the first-year Northlake MBA GPAs for US students different than for foreign students?
Step 1: Formulate the Null and Alternative Hypotheses
  • Null Hypothesis (H0H_0): Assumes no difference between the population means.
    • H<em>0:μ</em>USμforeign=0H<em>0: \mu</em>{US} - \mu_{foreign} = 0 (Mean MBA GPAs for US students are not different than foreign students).
  • Alternative Hypothesis (HaH_a): Claims there is a difference between the population means.
    • H<em>a:μ</em>USμforeign0H<em>a: \mu</em>{US} - \mu_{foreign} \neq 0 (Mean MBA GPAs for US students are different than foreign students).
  • This is a two-tailed test because we are not hypothesizing a specific direction (i.e., not saying US GPAs are higher or lower, just different).
  • If a significant difference is found, one would then look at descriptive statistics (means) to determine the direction.
Step 2: Select the Significance Level
  • The significance level (α\alpha) is chosen by comparing the costs of Type I and Type II errors.
  • In this example, the traditional alpha level is set at α=.05\alpha = .05 (5 percent).
Sampling Distribution of Mean Differences
  • Every significant test has a sampling distribution in its background.
  • For the independent samples t-test, this is the sampling distribution of the differences between the means.
  • This distribution shows the probability of observing a difference between two sample means of a certain size, assuming the null hypothesis is true (i.e., there is no real difference between the population means, and any observed difference is due to sampling error).
  • Under H0H_0, the expected difference is zero, meaning observed differences close to zero are more likely, and differences further from zero are less likely.
  • The α=.05\alpha = .05 significance level defines two rejection regions (for a two-tailed test), one on each side of the distribution, totaling 5% of the area. These typically start around ±1.97\pm 1.97 standard deviations from the mean (based on the t-table for specific degrees of freedom).
  • If the observed difference falls within the middle area, we retain H<em>0H<em>0. If it falls into one of the tails (rejection regions), we reject H</em>0H</em>0 in favor of HaH_a.
Step 3: Select the Statistic and Calculate Its Value
  • To compare two unrelated sample means, an Independent Samples t-test is used (often referred to as "Equal Variances" in Excel or "Independent Samples Test" in SPSS).
  • The test requires an assumption that the variances of the two populations (US and foreign students) are equal. While a technicality, the t-test is generally robust to violations of this assumption if sample sizes are large and not dramatically different.
  • The t-statistic is calculated using the formula: t=(xˉ<em>1xˉ</em>2)(μ<em>1μ</em>2)s<em>p2(1/n</em>1+1/n2)t = \frac{(\bar{x}<em>1 - \bar{x}</em>2) - (\mu<em>1 - \mu</em>2)}{\sqrt{s<em>p^2 (1/n</em>1 + 1/n_2)}} Where:
    • (xˉ<em>1xˉ</em>2)(\bar{x}<em>1 - \bar{x}</em>2) is the observed difference between sample means.
    • (μ<em>1μ</em>2)(\mu<em>1 - \mu</em>2) is the hypothesized difference under H0H_0 (which is 00).
    • sp2s_p^2 is the pooled sample variance.
    • n<em>1n<em>1 and n</em>2n</em>2 are the sample sizes.
  • This process standardizes the observed difference into standard deviation units of the sampling distribution, similar to what's done for one-sample tests.
  • Example Calculation: For observed means xˉ<em>1=3.18\bar{x}<em>1 = 3.18 (US) and xˉ</em>2=3.14\bar{x}</em>2 = 3.14 (Foreign), a pooled variance and sample sizes (n<em>1=172n<em>1 = 172, n</em>2=30n</em>2 = 30) yield:
    t=(3.183.14)00.103(1/172+1/30)=0.040.063=0.64t = \frac{(3.18 - 3.14) - 0}{\sqrt{0.103 (1/172 + 1/30)}} = \frac{0.04}{0.063} = 0.64
  • Degrees of Freedom (df): Calculated as df=n<em>1+n</em>22df = n<em>1 + n</em>2 - 2. For this example, df=172+302=200df = 172 + 30 - 2 = 200. Two degrees of freedom are lost because two parameters (the average of each sample) are estimated.
  • Statistical software (e.g., Excel, SPSS) performs these calculations automatically.
Step 4: Identify the Critical Value for the Test Statistic and State the Decision
  • There are two methods to make a decision:
    • Method 1: Using p-value vs. α\alpha (Recommended for ease of use with software output)
      • If the two-tailed p-value is less than α=.05\alpha = .05, reject the null hypothesis. Conclude there are significant differences between the mean MBA GPAs.
      • If the two-tailed p-value is greater than or equal to α=.05\alpha = .05, retain the null hypothesis. Conclude there is no sufficient evidence of significant differences.
      • If H<em>0H<em>0 is rejected, compare sample means (xˉ</em>US\bar{x}</em>{US} vs. xˉforeign\bar{x}_{foreign}) to determine the direction of the difference.
    • Method 2: Using t-statistic vs. t-critical values
      • Identify the critical t-values (t<em>critt<em>{crit}) from a t-table for the given degrees of freedom and alpha level. For df=200df = 200 and α=.05\alpha = .05 (two-tailed), t</em>crit=±1.97t</em>{crit} = \pm 1.97.
      • If the calculated t-statistic (t|t|) is greater than tcrit|t_{crit}| (i.e., t > 1.97 or t < -1.97 ), reject the null hypothesis.
      • If t|t| is less than or equal to tcrit|t_{crit}| (i.e., 1.97t1.97-1.97 \le t \le 1.97), retain the null hypothesis.
Step 5: Reaching a Conclusion
  • From Excel Output:
    • Calculated t-statistic: t200=0.643194t_{200} = 0.643194.
    • Two-tailed p-value: p=0.520835p = 0.520835.
    • Critical t-value (two-tailed, α=.05\alpha = .05): tcrit=1.971896t_{crit} = 1.971896.
  • Using Method 1 (p-value): p=0.52p = 0.52 is greater than α=.05\alpha = .05. Therefore, retain the null hypothesis.
  • Using Method 2 (t-statistic): t=0.64t = 0.64 is between 1.97-1.97 and 1.971.97. Therefore, retain the null hypothesis.
  • Conclusion: There is no evidence of a statistically significant difference in the mean MBA GPAs between US and foreign students.
Step 6: Making a Business-Related Decision
  • Since no average differences were found between US and foreign students' first-year MBA GPAs, a business decision could be to pool all applicants regardless of citizenship and not use citizenship status as a factor in the selection process.
Notes on "Independent" t-tests
  • Nomenclature: