Chi-Squared Test and Cohort Studies

Chi-Squared Test and Cohort Studies

Chi-Squared Test (\chi^2)

  • Non-parametric test.
  • Used to analyze categorical data (e.g., cases vs. non-cases).
  • Commonly uses a row x column (r x c) table to organize data, also known as cross-classification or contingency table (e.g., 2x2 table).
  • Only valid when frequencies are used in the cells; proportions, means, or physical measurements are not valid.
  • Detects associations between row and column data but does not indicate the strength of the association.
  • More accurate with large frequencies.
    • Should be at least 1.
    • 80% or more should be at least 5.
  • Produces a \chi^2 statistic and degrees of freedom ((r-1)x(c-1)).
  • Use the \chi^2 table to determine the p-value.
  • Used to determine if an association between exposure and disease (shown by measures of association like Risk Ratio or Odds Ratio) is due to chance alone.
  • If the p-value is < 0.05, the association is unlikely due to chance, suggesting a relationship between disease and exposure.
  • Chi-square test types vary and are not used when the sample size is < 30. The Fisher exact test is used for samples < 30.

Example

  • Degree of Freedom: 1
  • \chi^2 statistic: 5.03
  • P-value < 0.05
  • Indicates an association between "X" and "Y".

Analytic Epidemiology Studies: Cohort Study

  • A group of persons (cohort) without the disease are followed over time.
  • One subgroup is exposed, and one is not exposed.
  • Examines if the exposure of interest is associated with the disease.
    • Prospective: Data is collected going forward in time.
    • Retrospective (historical): At least some data is collected from the past (e.g., foodborne illness cases from a church supper).
  • The cohort or population at risk is known, thus attack rates can be used to identify the likely risk factor for disease.

*Attack Rate Formula:
Attack Rate = \frac{new cases}{pop. at risk} / given time \text{ at the beginning of time period}

Food Borne Illness

For foodborne illness investigations:

  • Identify the food with the highest attack rate if eaten.
  • Identify the food with the lowest attack rate if not eaten.
  • Identify the food eaten by the most cases.

Example Problem

  • Calculate Attack Rate for:
    • Persons who ate Food A
    • Persons who did not eat Food A
  • What do we discover?
    • The attack rate is high among those exposed to Food A
    • The attack rate is low among those not exposed to food A
    • Most cases (48/50) were exposed to Food A

Church Supper Example

  • In an outbreak of gastroenteritis following a church supper, attack rates were calculated for those who did and did not eat each of the 14 food items.
  • The most likely vehicle is vanilla ice cream because it has the highest attack rate (80%) for those who ate it and the lowest for those who did not (14%).

Risk Ratio (RR) or Relative Risk

  • A measure of association assesses the strength of association between exposure & disease.
  • The best measure of association for a cohort study is the Risk Ratio (RR) or Relative Risk
  • Ratio of the incidence rate of a disease or health outcome in an exposed group to the incidence rate of the disease or condition in a non-exposed group

Formula

RR = \frac{\frac{A}{A+B}}{\frac{C}{C+D}}

Where:

  • A = Exposed and Diseased
  • B = Exposed and Not Diseased
  • C = Not Exposed and Diseased
  • D = Not Exposed and Not Diseased

Interpretation of Risk Ratio

  • If RR = 1.0, the risk is equal for both those exposed & not exposed; therefore, the exposure is not causing the disease.
  • If RR > 1.0, the risk is greater for those exposed, and the exposure is more likely the cause of the disease. For example, if RR = 3, then those exposed are 3 times more at risk of disease than those not exposed.
  • If RR < 1.0, the exposure is likely protective from the disease.

Relative Risk

  • A relative risk
    • GREATER than 1 means the risk is INCREASED
  • A relative risk of 1.0 means there is NO association between the risk factor and the disease
  • A relative risk LESS
    • than 1 means the risk is DECREASED

Example (RR Calculation)

  • RR = (43/54) / (3/21) = 0.8 / 0.14 = 5.7
  • If RR > 1.0, risk is greater for those exposed & the exposure is more likely the cause of disease.

95% Confidence Interval (CI)

  • Is the range of values of the measure of association (RR or OR) that have a 95% chance of containing the true measure of association
  • Investigator is 95% “confident” the range contains the true measure of association
  • A statistically significant (p-value<0.05) association between exposure & disease will NOT contain a value of 1.0 within the range of values of the 95% CI
  • A RR or OR of 1.0 = the null hypothesis, no difference in risk or estimated risk for exposed & unexposed groups, no relationship between exposure & disease
  • Either can be a measure of association for a cohort study, but the RR is a direct measure of risk
  • Odds Ratio, OR is an estimate of Risk Ratio, OR (covered in more detail in Part 9)

Statistical Significance

  • A test of statistical significance determines the likelihood or probability that an association between exposure & disease is due to chance alone

Steps

  1. Assume the null hypothesis (HO) = exposure & disease are NOT related (the association could be due to chance)
  2. Alternate hypothesis, HA: exposure & disease are related
  3. Choose an appropriate statistical test (i.e. Chi Square test) to calculate a test statistic which corresponds to a p (probability) value
  4. A p-value is selected to serve as a cutoff (significance) point (i.e. commonly 0.05, which represents a 5% likelihood of being wrong in rejecting the null hypothesis)

Steps Cont'd

  • If the test statistic corresponds to a p-value > 0.05, then chance alone likely explains the relationship between exposure & disease
    • We would reject the HA and accept the HO
  • If the test statistic corresponds to a p-value < 0.05, exposure & disease are related, and association is not due to chance
    • We would reject the Ho and accept the HA

Chi-Square Calculation

Example: Chi-square test & an air pollution study

*Air pollution study

A random sample of 200 households were selected from each of 2 communities (Community A and B)

A respondent in each household was asked whether or not anyone in the household was bothered by air pollution
HO: proportion bothered by air pollution in Community A is equal proportion bothered by air pollution in Community B
HA: proportion bothered by air pollution in Community A is not equal proportion bothered by air pollution in Community B
p value = 0.05

  • Do we reject the null hypothesis? Yes: the exposures are not equal, one has a greater proportion of being bothered by air pollution than the other.