Chi-Square Test: Contingency Tables

Chi-Square Test: Contingency Tables

  • Used to determine whether there is an association between two categorical variables.
    • Example: Personality (Introvert, Extrovert) and Colour Preference (Red, Yellow, Green, Blue).

Application

  • The Chi-Square test may be used to investigate the association between personality and colour preference.
  • Note: The Chi-Square test may be used for ordinal data, but the test will treat the ordinal data as categorical. In R, it is possible to modify the Chi-Square test using the linear-by-linear option to ensure the order is taken into account.

Hypotheses

  • Null Hypothesis (H_0): There is no association between the variables.
  • Alternative Hypothesis (H_1): There is an association between the variables.
  • The method is based on comparing observed frequencies with the frequencies you would expect to get by chance.

Test Statistic

  • For a table with r rows and c columns, the Chi-Square statistic is calculated as:
    x^2 = \sum{i=1}^{r} \sum{j=1}^{c} \frac{(O{ij} - E{ij})^2}{E_{ij}}
    where:

    • O_{ij} represents the observed frequency.
    • E_{ij} represents the expected frequency.
  • x^2 approximately follows a \chi^2 distribution with (r-1)(c-1) degrees of freedom.

Expected Frequencies

  • The expected frequency E{ij} is calculated as: E{ij} = \frac{Y{i.} \times Y{.j}}{n} where:
    • Y_{i.} gives the row totals.
    • Y_{.j} gives the column totals.
    • n is the total number of observations.

Evaluation

  • We evaluate x^2 using tables of \chi^2 distribution with (r-1)(c-1) degrees of freedom.

Yates' Continuity Correction

  • For 2x2 frequency tables (where degrees of freedom, df = 1 = (r-1)(c-1)), the Chi-Square test produces overly significant results (rejecting H_0 when it is true).
  • In such cases, we apply Yates' Continuity Correction to the test statistic:
    x^2{\text{corrected}} = \sum{i=1}^{r} \sum{j=1}^{c} \frac{(|O{ij} - E{ij}| - 0.5)^2}{E{ij}}
  • Yates' continuity correction is also applied to the x^2 goodness-of-fit test when df = K-1 = 1.

Effect Size: Strength of Association


  • Chi-Square tests do not tell us how strong an association is; therefore, consider effect size measures.



    • Phi Coefficient (\phi):

  • Used for 2x2 tables only.
    \phi = \sqrt{\frac{x^2}{n}}
  • Guidelines:
    • Small: 0.1
    • Medium: 0.3
    • Large: 0.5


  • Cramer's V:

    • Can be used with 2 categorical variables when each variable has 2 or more categories.
      V = \sqrt{\frac{x^2}{n \times dfv}} where dfv = \min(c-1, r-1)


  • 0 \leq V \leq 1. When V = 0, there is no association between the variables. V = 1 only when the variables are equal to each other.



    • Guidelines:
  • dfvSmallMediumLarge
    1 (2x2)0.10.30.5
    20.070.210.35
    30.060.170.29
    40.050.150.25
    50.050.130.22
  • Odds Ratio:

    • Consider the following table:

    • Outcome A
      Outcome B
      Totals
      Group 1
      A_1
      B_1
      N_1
      Group 2
      A_2
      B_2
      N_2
      Totals
      N_A
      N_B
    • The odds ratio (OR) is given by:

    • OR = \frac{A1 / B1}{A2 / B2} = \frac{A1 B2}{A2 B1}

      Evaluation of Odds Ratio

      • OR = 1: Belonging to Group 1 has not affected the odds of Outcome A.
      • OR > 1: Belonging to Group 1 has increased the odds of Outcome A.
      • OR < 1: Belonging to Group 1 has decreased the odds of Outcome A.

      Post Hoc Tests

      • If x^2 is significant ⇒ association between the variables, but it does not provide any specific information about the association.
      • In R, we can use post hoc tests to investigate further. We will use the standardised residuals approach.

      The Likelihood Ratio

      • An alternative to the x^2 test uses a Model based on Maximum-likelihood theory.
        L{x^2} = 2 \left{ \sum{i=1}^{r} \sum{j=1}^{c} y{ij} \ln \left( \frac{y{ij}}{E{ij}} \right) \right}

      Evaluation of Likelihood Ratio

      • Evaluate L_{x^2} in the same way as x^2.
      • Example: Following Example 3.6, find L_{x^2} = 0.42.
      • Again, \chi^2{0.05, 1} = 3.841 > 0.42 = L{x^2}. Do not reject \H_0 ⇒ there does not appear to be an association between Education Levels and Department.

      Violation of the expected frequency assumption

      Fisher's Exact Test