Chi Square Hypothesis Testing
Chapter 10: Hypothesis Testing IV (Chi Square)
Chapter Outline
- Introduction
- Bivariate Table
- The Logic of Chi Square (\chi^2) test
- Chi Square Test for Independence
- The Five-Step Model
- Computation of Chi Square (\chi^2)
Bivariate Table
- Columns represent values of the independent variable.
- Rows represent values of the dependent variable.
- Cells represent the intersections of columns and rows.
- Each cell reports the number of times each combination of values occurred.
Bivariate Table Structure
| Column 1 | Column 2 | |
---|
Row 1 | cell a | cell b | Row Marginal 1 |
Row 2 | cell c | cell d | Row Marginal 2 |
| | | |
| | | N (Total) |
Column Marginal 1 | Column Marginal 2 | | |
Basic Logic of \chi^2 Test
- Chi-Square is a test of significance based on a bivariate table.
- Most often, both variables are categorical (i.e., nominal or ordinal).
- We are looking for significant differences between the observed frequencies (fo) and the expected frequencies (fe) given two variables are independent.
Example of \chi^2 Test
- Is there any statistically significant relationship between gender and party identification?
- Data is based on the 1991 General Social Survey.
Example Data: Party Identification by Gender
Gender | Party Identification | | | |
---|
| Females | Males | total | |
Democrat | 279 | 165 | 444 | |
Independent | 73 | 47 | 120 | |
Republican | 225 | 191 | 416 | |
total | 577 | 403 | N=980 | |
Step 1: Assumptions and Test Requirements
- An independent random sample.
- Both variables are categorical (as is typical for the chi-square test).
- No assumption is made about the shape of the population distribution (the chi-square test is a non-parametric test).
Step 2: State the Null and Research Hypotheses (H0 and H1)
- H_0: Party identification and gender are independent. (Gender has no effect on party identification).
- H_1: Party identification and gender are dependent. (Gender has some effect on party identification).
Step 3: Select Sampling Distribution and Establish the Critical Region
- Sampling Distribution = \chi^2
- Use the table in Appendix C (“Distribution of Chi Square”) to find \chi^2 (critical):
- df = (r-1)(c-1) = (3-1)(2-1) = 2
- If we set \alpha = 0.05, \chi^2 (critical) = 5.991
Step 4: Compute the Test Statistic (\chi^2)
\chi^2 (obtained) = \sum \frac{(fo - fe)^2}{f_e}
Step 4: Compute the Test Statistic (\chi^2) - Expected Frequencies
- Compute expected frequencies (f_e):
- (444*577)/980 = 261.4
- (444*403)/980 = 182.6
- (120*577)/980 = 70.7
- (120*403)/980 = 49.3
- (416*577)/980 = 244.9
- (416*403)/980 = 171.1
Step 4: Compute the Test Statistic (\chi^2): Party Identification by Gender (with Expected Frequencies)
Gender | Party Identification | | | |
---|
| Females | Males | total | |
Democrat | 279 (261.4) | 165 (182.6) | 444 | |
Independent | 73 (70.7) | 47 (49.3) | 120 | |
Republican | 225 (244.9) | 191 (171.1) | 416 | |
total | 577 | 403 | N=980 | |
| | | | |
| (Expected frequencies are in the parentheses) | | | |
Step 4: Compute the Test Statistic (\chi^2) - Calculation Table
f_o | f_e | fo - fe | (fo - fe)^2 | (fo - fe)^2 / f_e |
---|
279 | 261.4 | 17.6 | 309.8 | 1.19 |
165 | 182.6 | -17.6 | 309.8 | 1.70 |
73 | 70.7 | 2.3 | 5.29 | 0.07 |
47 | 49.3 | -2.3 | 5.29 | 0.11 |
225 | 244.9 | -19.9 | 396.0 | 1.62 |
191 | 171.1 | 19.9 | 396.0 | 2.31 |
980 | 980 | 0 | | \chi^2(obtained) = 7.0 |
Step 5: Interpret Results and Make a Decision
- \chi^2 (critical) = 5.991
- \chi^2 (obtained) = 7.0
- The test statistic is in the critical region. Therefore, we reject the H_0.
- There is a significant relationship between gender and party identification (or gender and party identification are dependent).
Interpreting Chi-Square Test
- The chi-square test tells us only if the variables are independent or not.
- It does not tell us the pattern or the nature of the relationship.
- To investigate the pattern, use observed frequencies to compute percentages within each column and compare across the columns.
Interpreting Chi-Square Test: Party Identification by Gender (Column Percentages)
Gender | Party Identification | | | |
---|
| Females | Males | total | |
Democrat | 279 (48.4%) | 165 (40.9%) | 444 | |
Independent | 73 (12.7%) | 47 (11.7%) | 120 | |
Republican | 225 (39.0%) | 191 (47.4%) | 416 | |
total | 577 | 403 | N=980 | |
| | | | |
| (use observed frequency; column percentages are in the parentheses) | | | |
Other Limitations of Chi-Square Test
- Similar to other types of hypothesis testing, chi-square is sensitive to sample size.
- As N increases, obtained chi-square increases.
- With large samples, trivial relationships may be significant.
- Remember: significance is not the same thing as importance.
Lab Exercise & Homework
- Lab Exercise 7 (ungraded)
- 10.11 (p. 285)
- Conduct the chi-square test and also compute the column percentage
- HW8 (graded)
- 10.12 (p. 286)
- Conduct the chi-square test and also compute the column percentage
- Formula 10.4 in “The Limitations of the Chi Square Test” (p. 280) is not covered.