Applied Business Statistics: Hypothesis Testing - Two Samples (Part 1)

Hypothesis Testing: Two Samples - Part 1: Introduction and Concepts

This week's lecture extends hypothesis testing from one sample to two samples, focusing on comparing two means from two distinct populations. The fundamental logic of hypothesis testing remains consistent.

Comparing Two Means: The Gender Pay Gap Example

Definition of Gender Pay Gap: The ratio of female to male median yearly earnings among full-time, year-round workers in the United States.
Unadjusted Comparison: In $2015$ , the average woman's unadjusted annual salary was cited as $78\%$ to $82\%$ of an average man's, meaning women made $78$ to $82$ cents for every dollar earned by a man. This directly involves comparing mean salaries.
- Such comparisons of mean salaries can be analyzed using a t-test.
Adjusted Comparison: After accounting for factors like college major, occupation, working hours, and parental leave, multiple studies found the pay rate difference narrowed to $5-6.6\%$ or women earning $94$ cents to every dollar earned by their male counterparts.
- Techniques like multiple regression are used to control for these confounding variables.
Remaining Gap: The residual $6\%$ difference, even after adjustments, is widely speculated to stem from two primary sources:
- Deficiency in salary negotiation skills.
- Gender discrimination.
- Recommendation: The book "Women Don't Ask" by Linda Babcock suggests that women often don't receive raises because they don't actively ask for them.

How to Test Differences Between Two Means

Data Collection: Requires collecting data from two distinct populations (e.g., men and women) to form two separate samples (e.g., sample of men's salaries, sample of women's salaries).
Comparison of Sample Averages: Compute the mean ( $\bar{x}{men}$ and $\bar{x}{women}$ ) from each sample and compare them.
Sampling Error: Even if there's no real difference in population means, sample means are highly unlikely to be exactly identical due to random sampling error.
Key Question: Are the observed differences between sample means 'significantly different', or are they merely the result of random sampling error (i.e., 'noise')?
Null Hypothesis (H0): In two-sample hypothesis testing, the null hypothesis typically assumes no real difference between the two population means. Expressed as: \mu{men} - ight \mu{women} = 0
- Hypotheses are formulated about population parameters ( $\mu$ ), not sample statistics ( $\bar{x}$ ) (which are directly observable).
Logic: If the null hypothesis (no real difference) is true, how likely is it to observe the specific difference between the two sample means that we found?
Need for a New Sampling Distribution: To answer this, we need a sampling distribution of the sample mean differences.
Theoretical Construction of the Sampling Distribution of Mean Difference:
1. Assume the null hypothesis is true (no population difference).
2. Repeatedly draw two new samples (e.g., one of men, one of women).
3. Compute the mean for each sample.
4. Calculate the difference between these two sample means.
5. Repeat this process many, many times.
6. The distribution formed by all these calculated differences is the sampling distribution of the mean difference. This distribution indicates the probability of observing various differences just by random sampling error under the null hypothesis.
Why a Significance Test?:
- Business or policy decisions (e.g., changing admission processes, implementing new training programs).
- These decisions are important and often costly.
- Significance tests ensure that observed differences are not just random chance, preventing the allocation of resources to solve problems that might not truly exist.
Other Examples of Two-Sample Testing:
- Mean number of patients admitted on Friday vs. Monday.
- Mean increase in sales for salespeople before and after a new training program.
- Mean items manufactured with two different processes to evaluate efficiency.

Sampling Distribution of Sample Mean Differences

This new sampling distribution is analogous to the one used for one-sample hypothesis testing but applies to the difference between two means.

For One Sample Hypothesis Testing (Review):
- Statistic: The sample mean, $\bar{x}$ .
- Parameter: The population mean,
  ight \mu.
- Mean of the sampling distribution ( $\mu_{\bar{x}}$ ) is equal to the population mean (\right \mu).
- Standard Deviation of the sampling distribution is called the standard error, denoted by $\sigma_{\bar{x}}$ .
- Shape: Normal, due to the Central Limit Theorem (CLT).
For Two Sample Hypothesis Testing (New):
- Statistic: The difference between two sample means, $\bar{x}1 - \bar{x}2$ .
- Parameter: The difference between two population means, \mu1 - ight \mu2.
- Null Hypothesis: Assumes no difference in population means, i.e., \mu1 - ight \mu2 = 0. This simplifies the expected mean of the sampling distribution to zero under the null.
- Mean of the sampling distribution of differences is equal to the real difference between the two populations, which we assume is zero under the null hypothesis.
- Standard Deviation of the sampling distribution is the standard error of the difference, denoted as $\sigma{(\bar{x}1 - \bar{x}_2)}$ .
 - This is a weighted average of the standard deviations of the two samples. The specific formula is complex but is generally calculated by statistical software.
- Shape: Normal, representing an extension of the Central Limit Theorem. This normally distributed shape, along with the mean and standard deviation, provides sufficient information to calculate p-values.

Paired vs. Independent Samples

The distinction between paired (dependent) and independent samples is crucial for selecting the correct statistical test and obtaining valid results.

Dependent (Paired) Samples:
- Involve paired measurements from the same set of items or individuals.
- Often incorporate a time element (e.g., 'before' and 'after' a treatment or intervention).
- Can also be connected in other ways (e.g., comparing performance of the same team using two different methods).
- Key Question: "Are the two samples connected?"
- If yes, use a paired t-test.
- Examples:
  - Mean salaries for husbands vs. wives (connected due to shared household, potential correlation).
  - Employee well-being measured before and after a training program (same individuals).
Independent Samples:
- Involve measurements from two entirely separate and unrelated groups or populations.
- Key Question: "Are the two samples connected?"
- If no, use an independent t-test.
- Example: Mean GPAs for entering freshmen at the University of Arizona (UA) and Arizona State University (ASU) (these groups are distinct and not linked).

The Impact of Data Collection on Sample Type

Whether samples are dependent or independent is determined by the method of data collection. The same research question could potentially be explored using either paired or independent samples, depending on the study design.

Weight Loss Programs:
- Comparing two separate groups of people (e.g., one group gets diet A, another gets diet B): Independent samples.
- Comparing the same person's weight before and after a reduction program: Paired samples.
Performance Evaluation (e.g., manufacturing methods):
- Comparing two separate groups using two different manufacturing methods: Independent samples.
- Comparing the performance of the same individuals using method 1 vs. method 2: Paired samples.
Sales Bonus Programs:
- Comparing two different groups of salespeople (one with bonuses, one without): Independent samples.
- Comparing the sales of the same person with and without a bonus: Paired samples.
Accounting Regulation (Sarbanes-Oxley Act of $2002$ ):
- Comparing mean financial disclosures of two different groups of firms before and after SOX: Independent samples.
- Comparing the financial disclosures of the same firms before and after SOX: Paired samples.

Northlake MBA Program Case Study

This case involves an MBA program director seeking to increase selectivity and reputation, using historical applicant data to refine the admission process.

Dataset Variables:
- ID: Applicant identifier.
- UGPA: Undergraduate GPA (an indicator for selection).
- MBA GPA: GPA at the end of the first year (an indicator of in-program performance, comparable across candidates).
- Foreign: A binary ( $1/0$ ) variable indicating whether the student is international or domestic.

Questions for Analysis from the Northlake MBA Dataset

Are the 1st year Northlake MBA GPAs for US students different than for foreign students?
- Why compare averages? We are interested in the typical academic performance of students from two distinct populations (US vs. foreign).
- Why a significance test? The decision to adjust admission criteria based on this difference is a business decision with significant cost and logistical implications, requiring strong evidence to justify changes.
- Why an independent t-test? US students and foreign students are two separate, unconnected groups. Their GPAs are not linked by any intrinsic pairing.
Are the 1st year MBA GPAs higher than the students’ undergraduate GPAs?
- Why compare averages? We want to understand the typical change in academic performance (GPA) for an average student as they transition from undergraduate to MBA studies.
- Why a significance test? Business decisions related to program design or support services might hinge on whether students generally improve or decline academically, or stay consistent. A significance test prevents conclusions based on random sample fluctuation.
- Why a paired t-test? The undergraduate GPA and the 1st year MBA GPA belong to the same student. This is a 'before and after' type of comparison, where measurements are intrinsically connected or dependent.