module 3 stat section B3

Analysis of Categorical Data - Module 3, Part 3

Overview of Categorical Data Analysis

  • Focus on various measures to analyze categorical data.

  • Chi-square test of independence is a helpful but limited tool.

  • Emphasis on bivariate analysis (analyzing two variables).

Options for Assessing Associations

  • Different methods for association analysis between two categorical variables:

    • Differences in Proportions: Assessing outcomes across categories.

    • Relative Risk: Comparison of risk between groups. If there's no difference, relative risk equals 1.

    • Odds Ratio (CAC Labels): Conveys the odds of an outcome occurring within different groups. Similar to relative risk, if there’s no difference, the odds ratio equals 1.

Example of Relative Risk Calculation

  • Study Group Data:

    • 39 patients on penicillin, 44 on placebo.

    • 32 out of 39 on penicillin improved, 25 out of 44 on placebo improved.

    • Relative Risk Calculation:

      • RR = (32/39) / (25/44) = 1.44.

    • Interpretation: Patients on penicillin are 1.44 times more likely to improve than those on placebo.

Example of Odds Ratio Calculation

  • Odds for Penicillin Group: 32 improved, 7 did not --> Odds = 32/7.

  • Odds for Placebo Group: 25 improved, 19 did not --> Odds = 25/19.

  • Odds Ratio Calculation:

    • OR = (32/7) / (25/19) = 3.74.

    • Interpretation: Patients on penicillin have 3.74 times higher odds of improvement relative to placebo.

Analyzing Repeated Measurements of Categorical Data

  • Challenges: Repeated measures can violate independence assumption.

  • Example: Assessing vitamin A levels before and after a role of community intervention.

    • Categories for vitamin A levels: Normal, Mild deficiency, Severe deficiency.

  • Using a two-way contingency table may not accurately reflect the data when measuring repeated observations.

McNemar's Test of Association

  • Appropriate for analyzing changes in categorical data when repeated measures are involved.

  • Applicable strictly to two-by-two contingency tables:

    • Result Table: Each row represents baseline status, and each column represents follow-up status after intervention.

    • Focus on off-diagonal entries to identify changes (indicated by significant counts).

  • McNemar's Test Formula:

    • Chi-square style calculation considering only the off-diagonal entries.

    • Formula: ( \chi^2 = \frac{(b-c)^2}{b+c} ) where b and c are the off-diagonal entries.

Statistical Significance

  • Large chi-square values indicate significant changes between baseline and follow-up.

  • Example result: p-value = 0.00455 indicates significant change in vitamin A levels post-intervention.

  • Caution: Ensure the direction of change is not preset or biased by study design.

Importance of Group Categorization

  • Grouping categories should be clinically relevant, not biased towards achieving significance.

  • Ensure no group has expected counts below 5 to comply with the chi-square test's assumptions.

  • Independence of samples must be maintained for validity.

Summary

  • Key techniques for assessing relationships in categorical data include relative risk, odds ratios, and McNemar's test for repeated measures.

  • Importance of careful experimental design and data categorization to ensure meaningful results.

  • Transition to the next topic: analysis of continuous data will be covered in the following segment, starting with the one-sample t-test.