Hypothesis Testing: Independent and Paired Samples

Hypothesis Testing: Two Samples

This lecture, part of BNAN 562, focuses on hypothesis testing involving two samples, specifically the independent samples t-test and the paired t-test.

Independent Samples t-test

Introduction and Purpose

The independent samples t-test is used to determine if there is a significant difference between the means of two unrelated (independent) groups.
Example Question: Are the first-year Northlake MBA GPAs for US students different than for foreign students?

Step 1: Formulate the Null and Alternative Hypotheses

Null Hypothesis ( $H_0$ ): Assumes no difference between the population means.
- $H0: \mu{US} - \mu_{foreign} = 0$ (Mean MBA GPAs for US students are not different than foreign students).
Alternative Hypothesis ( $H_a$ ): Claims there is a difference between the population means.
- $Ha: \mu{US} - \mu_{foreign} \neq 0$ (Mean MBA GPAs for US students are different than foreign students).
This is a two-tailed test because we are not hypothesizing a specific direction (i.e., not saying US GPAs are higher or lower, just different).
If a significant difference is found, one would then look at descriptive statistics (means) to determine the direction.

Step 2: Select the Significance Level

The significance level ( $\alpha$ ) is chosen by comparing the costs of Type I and Type II errors.
In this example, the traditional alpha level is set at $\alpha = .05$ (5 percent).

Sampling Distribution of Mean Differences

Every significant test has a sampling distribution in its background.
For the independent samples t-test, this is the sampling distribution of the differences between the means.
This distribution shows the probability of observing a difference between two sample means of a certain size, assuming the null hypothesis is true (i.e., there is no real difference between the population means, and any observed difference is due to sampling error).
Under $H_0$ , the expected difference is zero, meaning observed differences close to zero are more likely, and differences further from zero are less likely.
The $\alpha = .05$ significance level defines two rejection regions (for a two-tailed test), one on each side of the distribution, totaling 5% of the area. These typically start around $\pm 1.97$ standard deviations from the mean (based on the t-table for specific degrees of freedom).
If the observed difference falls within the middle area, we retain $H0$ . If it falls into one of the tails (rejection regions), we reject $H0$ in favor of $H_a$ .

Step 3: Select the Statistic and Calculate Its Value

To compare two unrelated sample means, an Independent Samples t-test is used (often referred to as "Equal Variances" in Excel or "Independent Samples Test" in SPSS).
The test requires an assumption that the variances of the two populations (US and foreign students) are equal. While a technicality, the t-test is generally robust to violations of this assumption if sample sizes are large and not dramatically different.
The t-statistic is calculated using the formula: $t = \frac{(\bar{x}1 - \bar{x}2) - (\mu1 - \mu2)}{\sqrt{sp^2 (1/n1 + 1/n_2)}}$ Where:
- $(\bar{x}1 - \bar{x}2)$ is the observed difference between sample means.
- $(\mu1 - \mu2)$ is the hypothesized difference under $H_0$ (which is $0$ ).
- $s_p^2$ is the pooled sample variance.
- $n1$ and $n2$ are the sample sizes.
This process standardizes the observed difference into standard deviation units of the sampling distribution, similar to what's done for one-sample tests.
Example Calculation: For observed means $\bar{x}1 = 3.18$ (US) and $\bar{x}2 = 3.14$ (Foreign), a pooled variance and sample sizes ( $n1 = 172$ , $n2 = 30$ ) yield:
$t = \frac{(3.18 - 3.14) - 0}{\sqrt{0.103 (1/172 + 1/30)}} = \frac{0.04}{0.063} = 0.64$
Degrees of Freedom (df): Calculated as $df = n1 + n2 - 2$ . For this example, $df = 172 + 30 - 2 = 200$ . Two degrees of freedom are lost because two parameters (the average of each sample) are estimated.
Statistical software (e.g., Excel, SPSS) performs these calculations automatically.

Step 4: Identify the Critical Value for the Test Statistic and State the Decision

There are two methods to make a decision:
- Method 1: Using p-value vs. $\alpha$ (Recommended for ease of use with software output)
 - If the two-tailed p-value is less than $\alpha = .05$ , reject the null hypothesis. Conclude there are significant differences between the mean MBA GPAs.
 - If the two-tailed p-value is greater than or equal to $\alpha = .05$ , retain the null hypothesis. Conclude there is no sufficient evidence of significant differences.
 - If $H0$ is rejected, compare sample means ( $\bar{x}{US}$ vs. $\bar{x}_{foreign}$ ) to determine the direction of the difference.
- Method 2: Using t-statistic vs. t-critical values
 - Identify the critical t-values ( $t{crit}$ ) from a t-table for the given degrees of freedom and alpha level. For $df = 200$ and $\alpha = .05$ (two-tailed), $t{crit} = \pm 1.97$ .
 - If the calculated t-statistic ( $|t|$ ) is greater than $|t_{crit}|$ (i.e., t > 1.97 or t < -1.97 ), reject the null hypothesis.
 - If $|t|$ is less than or equal to $|t_{crit}|$ (i.e., $-1.97 \le t \le 1.97$ ), retain the null hypothesis.

Step 5: Reaching a Conclusion

From Excel Output:
- Calculated t-statistic: $t_{200} = 0.643194$ .
- Two-tailed p-value: $p = 0.520835$ .
- Critical t-value (two-tailed, $\alpha = .05$ ): $t_{crit} = 1.971896$ .
Using Method 1 (p-value): $p = 0.52$ is greater than $\alpha = .05$ . Therefore, retain the null hypothesis.
Using Method 2 (t-statistic): $t = 0.64$ is between $-1.97$ and $1.97$ . Therefore, retain the null hypothesis.
Conclusion: There is no evidence of a statistically significant difference in the mean MBA GPAs between US and foreign students.

Step 6: Making a Business-Related Decision

Since no average differences were found between US and foreign students' first-year MBA GPAs, a business decision could be to pool all applicants regardless of citizenship and not use citizenship status as a factor in the selection process.

Notes on "Independent" t-tests

Nomenclature: