Chi Square Test of Homogeneity Notes

Definition: A test comparing the distribution of counts for two or more groups on the same categorical variable, referred to as a chi square test of homogeneity. This is a generalization of the two proportion z test.
Objective: To determine if the proportions are the same for each subcategory of the categorical variable.
Test Statistic: The test statistic calculated (chi squared statistic) is the same as the one for the goodness of fit test.
Key Differences:
- In the test of homogeneity, we determine if choices have changed, so there is no model fitted to it.
- The expected counts are derived directly from the data instead of from a model (percentage of totals).
- Different degrees of freedom calculation.
Assumptions and Conditions:
- Counted Data Condition: All data must consist of counts (whole numbers, not measurements).
- Randomization Condition: This condition isn't necessary unless generalizing to larger groups.
- Expected Cell Frequency Condition: Each expected cell count must be at least 5.

To find expected counts:
- Use the formula: $E{ij} = \frac{(Row\ Totali) \times (Column\ Totalj)}{Grand\ Total}$ where ($E{ij}$) represents the expected count for cell $ij$.
- This follows a proportional reasoning which helps expedite calculations.

The chi square statistic is calculated as:
$\chi^2 = \sum \frac{(O{ij} - E{ij})^2}{E{ij}}$ where $O{ij}$ is the observed frequency and $E_{ij}$ is the expected frequency.
Degrees of Freedom: Calculated as
$df = (Number\ of\ Rows - 1) \times (Number\ of\ Columns - 1)$

To perform a chi square test:
1. Enter observed values into a matrix (recommended matrix A).
2. Run the chi square test using the calculator stat test option which calculates expected values automatically.
3. Output includes return of chi square value, p-value, and degrees of freedom.

Upon rejecting the null hypothesis, examine standardized residuals:
- Formula for standardized residual: $Standardized\ Residual = \frac{(O{ij} - E{ij})}{\sqrt{E_{ij}}}$
- Assists in comparing residuals from cells with different expected values (e.g. comparing count 100 vs count 3).
Larger standardized residuals indicate a greater impact on the overall result.

Experimental Design: 90 subjects assigned randomly to three groups: placebo (control), St. John's wort, and prescription drug POSRX.
Hypotheses:
- Null Hypothesis: The rate of recurrence for depression is the same across all treatments.
- Alternate Hypothesis: The rate of recurrence differs for at least one treatment.
Expected Values:
- For the first cell: $E = \frac{(Row\ Total) \times (Column\ Total)}{Total}\ which\ equals\ 20$
- Similar calculations yield expected counts of 20, 20, 20 for the first treatment and 10, 10, 10 for the rest.
Conditions Verification:
- All expected values are at least 5.
- Counts in the model are appropriate.
- Subjects were randomly assigned.
Chi Square Value Calculation:
- After calculations yield a chi square statistic of 8.4.
P-Value Calculation:
- Using calculator to determine the p-value yields approximately 0.015.
Conclusion: Since we reject the null hypothesis, there is strong evidence that the treatments differ effectively in preventing depression recurrence, particularly noting POSRX as the most effective treatment.