Chi Square Test of Homogeneity Notes

Chi Square Test of Homogeneity

  • Definition: A test comparing the distribution of counts for two or more groups on the same categorical variable, referred to as a chi square test of homogeneity. This is a generalization of the two proportion z test.

  • Objective: To determine if the proportions are the same for each subcategory of the categorical variable.

  • Test Statistic: The test statistic calculated (chi squared statistic) is the same as the one for the goodness of fit test.

  • Key Differences:

    • In the test of homogeneity, we determine if choices have changed, so there is no model fitted to it.
    • The expected counts are derived directly from the data instead of from a model (percentage of totals).
    • Different degrees of freedom calculation.
  • Assumptions and Conditions:

    • Counted Data Condition: All data must consist of counts (whole numbers, not measurements).
    • Randomization Condition: This condition isn't necessary unless generalizing to larger groups.
    • Expected Cell Frequency Condition: Each expected cell count must be at least 5.

Expected Counts Calculation

  • To find expected counts:
    • Use the formula: E{ij} = \frac{(Row\ Totali) \times (Column\ Totalj)}{Grand\ Total} where ($E{ij}$) represents the expected count for cell $ij$.
    • This follows a proportional reasoning which helps expedite calculations.

Chi Square Statistic Calculation

  • The chi square statistic is calculated as:
    \chi^2 = \sum \frac{(O{ij} - E{ij})^2}{E{ij}} where $O{ij}$ is the observed frequency and $E_{ij}$ is the expected frequency.

  • Degrees of Freedom: Calculated as
    df = (Number\ of\ Rows - 1) \times (Number\ of\ Columns - 1)

Using a Calculator

  • To perform a chi square test:
    1. Enter observed values into a matrix (recommended matrix A).
    2. Run the chi square test using the calculator stat test option which calculates expected values automatically.
    3. Output includes return of chi square value, p-value, and degrees of freedom.

Examining Residuals

  • Upon rejecting the null hypothesis, examine standardized residuals:
    • Formula for standardized residual: Standardized\ Residual = \frac{(O{ij} - E{ij})}{\sqrt{E_{ij}}}
    • Assists in comparing residuals from cells with different expected values (e.g. comparing count 100 vs count 3).
  • Larger standardized residuals indicate a greater impact on the overall result.

Sample Problem: Comparing Treatments for Depression

  • Experimental Design: 90 subjects assigned randomly to three groups: placebo (control), St. John's wort, and prescription drug POSRX.

  • Hypotheses:

    • Null Hypothesis: The rate of recurrence for depression is the same across all treatments.
    • Alternate Hypothesis: The rate of recurrence differs for at least one treatment.
  • Expected Values:

    • For the first cell: E = \frac{(Row\ Total) \times (Column\ Total)}{Total}\ which\ equals\ 20
    • Similar calculations yield expected counts of 20, 20, 20 for the first treatment and 10, 10, 10 for the rest.
  • Conditions Verification:

    • All expected values are at least 5.
    • Counts in the model are appropriate.
    • Subjects were randomly assigned.
  • Chi Square Value Calculation:

    • After calculations yield a chi square statistic of 8.4.
  • P-Value Calculation:

    • Using calculator to determine the p-value yields approximately 0.015.
  • Conclusion: Since we reject the null hypothesis, there is strong evidence that the treatments differ effectively in preventing depression recurrence, particularly noting POSRX as the most effective treatment.