Study Notes on Chi-Square Test for Independence
Chapter 17: Chi-Square
Chi-Square Test for Independence
Overview: This section examines whether there is a relationship between two categorical variables.
Categorical Variables
Definition: Categorical variables represent data that fall into categories, which can be either nominal (e.g., political affiliation) or ordinal (e.g., academic ranking).
Examples:
Political affiliation
Academic major
Verdict choice (e.g., guilty vs. not guilty)
Data Representation: Frequencies are recorded (e.g., number of Psychology majors, number of people selecting a guilty verdict), not means.
Two-Dimensional Matrix Presentation
Matrix Structure: Represents data with two categorical variables.
Example Variables:
Variable 1: "Do you have superpowers?" (Responses: yes/no)
Variable 2: "What is your favorite ice cream flavor?" (Options: chocolate or vanilla)
Observed Frequencies: Numbers inside the table representing the actual counts (e.g., number of people with superpowers who like chocolate).
Total Counts: Sums of rows and columns are shown outside the table.
Chi-Square Test for Independence
Independence Concept: Variables are considered independent if the frequency distribution of one variable does not affect the frequency distribution of the other.
Independence Example: If 60% of the dataset prefers chocolate, if powers are independent of flavor preference, then:
60% of people with powers prefer chocolate
60% of people without powers prefer chocolate
Hypothesis Testing Steps
State the Hypotheses:
Null Hypothesis (H0): Powers and flavor preference are independent.
Alternative Hypothesis (H1): Powers and flavor preference are NOT independent.
Calculate Degrees of Freedom (df) & Critical Region:
Formula for df:
Where R is the number of rows, C is the number of columns.
For the example: R = 2 (yes/no), C = 2 (chocolate/vanilla)
Calculation:
Calculate Statistics:
Find expected frequencies (Fe).
Expected Frequencies (Fe)
Definition: Expected frequencies are the anticipated counts under the null hypothesis assuming independence of variables.
Calculation Methods:
Version 1:
i. Calculate percentages for columns (e.g., percentage of the dataset that likes chocolate).Example Calculation of Percentages:
Total liking chocolate = 60 out of 100 = 0.60 or 60%.
Total liking vanilla = 40 out of 100 = 0.40 or 40%.
Remarks: Keep four decimal points in percentages without rounding.
Actual Calculation of Expected Frequencies
For each row: Multiply column percentage by row total.
Example calculations:
Expected frequency for "Has Powers" liking chocolate:
(from 30 people who have powers)
Expected frequency for "No Powers" liking chocolate:
(from 70 people who do not have powers)
Expected Frequencies for Vanilla:
Powers:
No Powers:
Validation: Ensure that sums of expected frequencies in rows equals total rows and columns.
Calculate Chi-Square Statistic
Formula:
Where (O) is the observed frequency and (E) is the expected frequency.
Example Calculation:
Completion of calculations leads to results being summed up to determine the chi-square value. Chi-square is always non-negative.
Steps for Decision Making about H0
Compare Chi-square Statistic: Calculate obtained chi-square against critical value from chi-square distribution table using df.
Example Result: For provided data, chi-square calculated = 4.96 against critical = 3.84.
Conclusion: Determine whether to reject or not reject H0 based on whether the statistic exceeds the critical value.
Reporting and Interpreting Results
Report findings in APA style: e.g., "We conducted a chi-square test for independence to evaluate whether superpowers (present, absent) was related to ice cream flavor preference (chocolate, vanilla)."
Describe the significance of the results and the interpretation of the percentage differences between categories across different groups. Include exact chi-square, df, and p-values in reporting.
Effect Size Measurement
Phi (φ) Statistic: Utilized when both categorical variables are dichotomous; computes the strength of association.
Formula:
Cramer’s V: Used when the matrix is larger than 2x2, providing an alternative measure of effect size:
Formula: where df is the smaller of either (R-1) or (C-1).
SPSS Output Notes
Utilize SPSS for statistical analysis, which provides outputs like observed frequencies, chi-square tests, and effect sizes. Ensure to interpret and report based on the context of the study and statistical outputs.