Mean Comparison (T-Tests)

Introduction to Mean Comparisons and the Null Hypothesis

Central Objective: The primary goal in comparing the means of two different groups of data is to determine if those means are substantively the same or if there is a statistically significant difference.
Null Hypothesis ( $H_0$ ): In most statistical tests comparing group means, the null hypothesis assumes that there is no difference between the means of the two groups.
Alternative Hypothesis: The researcher usually tests for whether a difference exists, often triggered by an experimental stimulus or a specific group characteristic.
Purpose of Comparison: Researchers compare groups to see if they look substantively different. For example, in a survey experiment, a researcher might investigate if an experimental group (exposed to a stimulus) changes its perceptions or behavior compared to a control group (not exposed to the stimulus).

Independent Samples T-Test

Definition: An independent samples t-test is used when comparing two distinct, independent samples that do not overlap.
Application in Survey Experiments: It is frequently used to compare the outcome of interest (dependent variable) between an experimental group and a control group. Examples include comparisons of perceptions, opinions, or a willingness to act.
Use in Checking Randomization: A critical usage of the independent samples t-test is to verify that random assignment worked. Researchers conduct a "difference of means" t-test on control variables (e.g., age, race, education) across groups. - In this scenario, the researcher actually hopes for no significant difference, which confirms that the experimental and control groups are substantively the same on factors that might affect the outcome, essentially isolating the independent variable.
Versatility of Groups: Independent t-tests can be applied to various units of analysis beyond survey respondents, such as comparing two groups of countries (e.g., democracies vs. non-democracies) to see if they differ in economic development levels.

Paired Samples T-Test

Definition: A paired samples t-test looks at the difference of means within the same units or between units that are logically paired together.
Change Over Time: The most common application is comparing the same person or case at two different points in time (Time A vs. Time B). - Example: Testing the effectiveness of an education policy by comparing a student's test scores before and after the implementation of the policy.
Non-Temporal Pairing: Data can be paired without a time element if the units are linked. - Example: Comparing the political identity (partisanship) of college students against the partisanship of their parents. Each child is paired with their specific parent, making it a paired samples test rather than an independent one.

The Mathematical Formula for the T-Test

The Symbolism of Mean: The variable representing the mean is denoted as $\bar{x}$ (referred to as "x-bar").
The Standard Formula: The t-score (t) represents the difference of the means divided by the error/spread of the data. - $t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$
Components of the Formula: - $\bar{x}_1$ : Mean of the first group. - $\bar{x}_2$ : Mean of the second group. - $s_1^2$ : Variance of the first group (Standard deviation squared). - $s_2^2$ : Variance of the second group (Standard deviation squared). - $n_1$ : Sample size of the first group. - $n_2$ : Sample size of the second group.
Denominator Explanation: The denominator factors in the spread of the data or the error around the mean, providing context for whether the difference in means is large relative to the variability within the groups.
Calculation Method: While t-tests can be calculated by hand or using a spreadsheet, statistical software like Stata provides automated outputs including the p-value.

Case Study: Threat Perception and Support for Torture

Authors: Conrad, Coroco Gomez, and Moore.
Research Question: Does the identity of a detainee (race/ethnicity) and the nature of the allegation (crime vs. terrorism) affect American public support for torture?
Methodology: A nationally representative survey experiment. Narratives were varied using specific names associated with identities and specific allegations.
Identities/Names Used: - Caucasian frame (example: William Shaw). - Latino frame (example: Hector Gonzalez). - Arabic frame.
Findings (Crime vs. Terror Frame): - For Caucasian detainees: Average support for torture was slightly higher in the terror frame than the crime frame, but the difference was not statistically significant ( $\text{Difference of means} = 0.331$ ; $p = 0.1233$ ). - Findings imply that for Caucasian suspects, the type of allegation (terror vs. crime) does not meaningfully change public support for torture in the population. - For Latino detainees: Support for torture increased significantly in the terror frame compared to the crime frame (p < 0.01). - For Arab detainees: Received the highest levels of support for torture overall. When comparing Arab suspects to Caucasian suspects for the same crime, the support for torture was significantly higher for the Arab suspect, even when controlling for the crime frame.
Visual Representation: The data is often shown via plots with a center bar (mean) and boxes/whiskers (confidence intervals). - Significant Difference Indicator: If confidence interval boxes for two groups do not overlap, there is typically a statistically significant difference. - Overlapping boxes indicate that there is no certain statistical difference.

Case Study: Randomization Checks in NGO Survey Experiments

Context: An experiment involving three treatment groups (individual frame, group frame, organizational frame) and one control group regarding advocacy organizations.
Goal of T-Test: To ensure that random assignment produced groups that were substantively the same on control variables: age, education, party, gender, race/ethnicity, and income.
Findings (Environmental NGO): - Most control variables showed no significant difference (p > 0.05). - Exception: The comparison between the control group and the "group frame" regarding the "White" race variable showed a p-value of approximately $0.160$ (not significant) though another comparison was close at $0.06$ . One calculation yielded a significant difference (p < 0.05), indicating potential issues with randomization for that specific demographic in that subset.
Findings (School/Backpack NGO): - A secondary study used a hypothetical NGO providing school resources to children. - Outcome: No statistically significant differences were found across any control variables (age, education, party, etc.) between the groups. This indicates effective random distribution, allowing researchers to rule out these factors as causal explanations for findings on the dependent variable.

Case Study: Learning Simulations in International Relations

Researchers: Crane and Lantus.
Study Design: Compared the effectiveness of an active learning simulation versus traditional lecture/discussion across two different topics (torture/human rights and nuclear proliferation).
Paired Samples T-Test Results: - Measured knowledge before and after the simulation for the same students. - Pre-test mean score: $4.38/6$ . - Post-test mean score: $5.25/6$ . - Result: Statistically significant positive effect (p < 0.05). Knowledge increased within the group.
Independent Samples T-Test Results: - Compared students in the simulation (experimental) vs. students in the lecture (control). - Result: No statistically significant difference in final quiz grades between simulation and lecture groups (p > 0.05). Both methods increased knowledge equally.
Perception vs. Knowledge: While raw scores were similar, students in the simulation group perceived that they had acquired a significantly more enhanced understanding of the material (p < 0.05) compared to the control group.

Practical Stata Instruction

Software Installation: Students are required to install Stata (licensed to the College of Wooster).
Licensing Process: - Access the university technology page and navigate to Stata. - Download the version corresponding to the OS (Mac, Windows, or 64-bit Linux). - Use the provided Serial Number and Authorization Code from the "Stata licensing information" link.
Program Utility: Stata allows for running commands to generate chi-square tests, difference of means t-tests, and descriptive statistics like mean and standard deviation without manual calculation.

Questions & Discussion

Q: Do we need to calculate t-tests by hand for the exam?
A: No, students are not expected to calculate it manually in this class, but they must be able to interpret the output and explain what a t-score or p-value is communicating.
Q: What is the general rule of thumb for p-values in Political Science?
A: The standard criterion is $0.05$ . If the p-value is less than $0.05$ , the null hypothesis is rejected, and the result is considered statistically significant. Other fields, like Psychology, may use different standards.
Q: Does a significant p-value tell you how strong an effect is?
A: No. Significance only tells you if you can reject the null hypothesis (i.e., whether the difference is likely to exist in the population). It does not indicate the magnitude or strength of the effect.
Q: Discussion on Stata installation issues: - Student concern: Problems with license codes on Mac vs. PC. - Resolution: Students should ensure they copy-paste codes (Serial, Code, Authorization) directly to avoid typos. One student found that their name or organizational field might need to be filled in (College of Wooster).
Personal Aside: Brief mention of the New York "ecosystem," jokes about rats, and references to "Sunflower" the cat.