Study Notes on Chi-Square Test for Independence

Chapter 17: Chi-Square

Chi-Square Test for Independence

  • Overview: This section examines whether there is a relationship between two categorical variables.

Categorical Variables
  • Definition: Categorical variables represent data that fall into categories, which can be either nominal (e.g., political affiliation) or ordinal (e.g., academic ranking).

  • Examples:

    • Political affiliation

    • Academic major

    • Verdict choice (e.g., guilty vs. not guilty)

  • Data Representation: Frequencies are recorded (e.g., number of Psychology majors, number of people selecting a guilty verdict), not means.

Two-Dimensional Matrix Presentation
  • Matrix Structure: Represents data with two categorical variables.

  • Example Variables:

    • Variable 1: "Do you have superpowers?" (Responses: yes/no)

    • Variable 2: "What is your favorite ice cream flavor?" (Options: chocolate or vanilla)

  • Observed Frequencies: Numbers inside the table representing the actual counts (e.g., number of people with superpowers who like chocolate).

  • Total Counts: Sums of rows and columns are shown outside the table.

Chi-Square Test for Independence
  • Independence Concept: Variables are considered independent if the frequency distribution of one variable does not affect the frequency distribution of the other.

  • Independence Example: If 60% of the dataset prefers chocolate, if powers are independent of flavor preference, then:

    • 60% of people with powers prefer chocolate

    • 60% of people without powers prefer chocolate

Hypothesis Testing Steps

  1. State the Hypotheses:

    • Null Hypothesis (H0): Powers and flavor preference are independent.

    • Alternative Hypothesis (H1): Powers and flavor preference are NOT independent.

  2. Calculate Degrees of Freedom (df) & Critical Region:

    • Formula for df: df=(R1)(C1)df = (R-1)(C-1)

    • Where R is the number of rows, C is the number of columns.

    • For the example: R = 2 (yes/no), C = 2 (chocolate/vanilla)

    • Calculation: df=(21)(21)=1×1=1df = (2-1)(2-1) = 1\times1 = 1

  3. Calculate Statistics:

    • Find expected frequencies (Fe).

Expected Frequencies (Fe)
  • Definition: Expected frequencies are the anticipated counts under the null hypothesis assuming independence of variables.

  • Calculation Methods:

    • Version 1:
      i. Calculate percentages for columns (e.g., percentage of the dataset that likes chocolate).

    • Example Calculation of Percentages:

    • Total liking chocolate = 60 out of 100 = 0.60 or 60%.

    • Total liking vanilla = 40 out of 100 = 0.40 or 40%.

    • Remarks: Keep four decimal points in percentages without rounding.

Actual Calculation of Expected Frequencies
  • For each row: Multiply column percentage by row total.

  • Example calculations:

    • Expected frequency for "Has Powers" liking chocolate:

    • 0.60×30=180.60 \times 30 = 18 (from 30 people who have powers)

    • Expected frequency for "No Powers" liking chocolate:

    • 0.60×70=420.60 \times 70 = 42 (from 70 people who do not have powers)

  • Expected Frequencies for Vanilla:

    • Powers: 0.40×30=120.40 \times 30 = 12

    • No Powers: 0.40×70=280.40 \times 70 = 28

  • Validation: Ensure that sums of expected frequencies in rows equals total rows and columns.

Calculate Chi-Square Statistic
  • Formula: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

    • Where (O) is the observed frequency and (E) is the expected frequency.

  • Example Calculation:

    • χ2=((2318)218+(712)212+(3742)242+(3328)228)\chi^2 = \sum (\frac{(23-18)^2}{18} + \frac{(7-12)^2}{12} + \frac{(37-42)^2}{42} + \frac{(33-28)^2}{28})

    • Completion of calculations leads to results being summed up to determine the chi-square value. Chi-square is always non-negative.

Steps for Decision Making about H0
  1. Compare Chi-square Statistic: Calculate obtained chi-square against critical value from chi-square distribution table using df.

  2. Example Result: For provided data, chi-square calculated = 4.96 against critical = 3.84.

  3. Conclusion: Determine whether to reject or not reject H0 based on whether the statistic exceeds the critical value.

Reporting and Interpreting Results
  • Report findings in APA style: e.g., "We conducted a chi-square test for independence to evaluate whether superpowers (present, absent) was related to ice cream flavor preference (chocolate, vanilla)."

  • Describe the significance of the results and the interpretation of the percentage differences between categories across different groups. Include exact chi-square, df, and p-values in reporting.

Effect Size Measurement
  • Phi (φ) Statistic: Utilized when both categorical variables are dichotomous; computes the strength of association.

    • Formula: Φ=χ2nΦ = \sqrt{\frac{\chi^2}{n}}

  • Cramer’s V: Used when the matrix is larger than 2x2, providing an alternative measure of effect size:

    • Formula: V=χ2ndf<em>V = \sqrt{\frac{\chi^2}{n \cdot df^<em>}} where df is the smaller of either (R-1) or (C-1).

SPSS Output Notes
  • Utilize SPSS for statistical analysis, which provides outputs like observed frequencies, chi-square tests, and effect sizes. Ensure to interpret and report based on the context of the study and statistical outputs.