Chi-Squared Test and Cohort Studies
Chi-Squared Test (\chi^2)
- Non-parametric test.
- Used to analyze categorical data (e.g., cases vs. non-cases).
- Commonly uses a row x column (r x c) table to organize data, also known as cross-classification or contingency table (e.g., 2x2 table).
- Only valid when frequencies are used in the cells; proportions, means, or physical measurements are not valid.
- Detects associations between row and column data but does not indicate the strength of the association.
- More accurate with large frequencies.
- Should be at least 1.
- 80% or more should be at least 5.
- Produces a \chi^2 statistic and degrees of freedom ((r-1)x(c-1)).
- Use the \chi^2 table to determine the p-value.
- Used to determine if an association between exposure and disease (shown by measures of association like Risk Ratio or Odds Ratio) is due to chance alone.
- If the p-value is < 0.05, the association is unlikely due to chance, suggesting a relationship between disease and exposure.
- Chi-square test types vary and are not used when the sample size is < 30. The Fisher exact test is used for samples < 30.
Example
- Degree of Freedom: 1
- \chi^2 statistic: 5.03
- P-value < 0.05
- Indicates an association between "X" and "Y".
Analytic Epidemiology Studies: Cohort Study
- A group of persons (cohort) without the disease are followed over time.
- One subgroup is exposed, and one is not exposed.
- Examines if the exposure of interest is associated with the disease.
- Prospective: Data is collected going forward in time.
- Retrospective (historical): At least some data is collected from the past (e.g., foodborne illness cases from a church supper).
- The cohort or population at risk is known, thus attack rates can be used to identify the likely risk factor for disease.
*Attack Rate Formula:
Attack Rate = \frac{new cases}{pop. at risk} / given time \text{ at the beginning of time period}
Food Borne Illness
For foodborne illness investigations:
- Identify the food with the highest attack rate if eaten.
- Identify the food with the lowest attack rate if not eaten.
- Identify the food eaten by the most cases.
Example Problem
- Calculate Attack Rate for:
- Persons who ate Food A
- Persons who did not eat Food A
- What do we discover?
- The attack rate is high among those exposed to Food A
- The attack rate is low among those not exposed to food A
- Most cases (48/50) were exposed to Food A
Church Supper Example
- In an outbreak of gastroenteritis following a church supper, attack rates were calculated for those who did and did not eat each of the 14 food items.
- The most likely vehicle is vanilla ice cream because it has the highest attack rate (80%) for those who ate it and the lowest for those who did not (14%).
Risk Ratio (RR) or Relative Risk
- A measure of association assesses the strength of association between exposure & disease.
- The best measure of association for a cohort study is the Risk Ratio (RR) or Relative Risk
- Ratio of the incidence rate of a disease or health outcome in an exposed group to the incidence rate of the disease or condition in a non-exposed group
RR = \frac{\frac{A}{A+B}}{\frac{C}{C+D}}
Where:
- A = Exposed and Diseased
- B = Exposed and Not Diseased
- C = Not Exposed and Diseased
- D = Not Exposed and Not Diseased
Interpretation of Risk Ratio
- If RR = 1.0, the risk is equal for both those exposed & not exposed; therefore, the exposure is not causing the disease.
- If RR > 1.0, the risk is greater for those exposed, and the exposure is more likely the cause of the disease. For example, if RR = 3, then those exposed are 3 times more at risk of disease than those not exposed.
- If RR < 1.0, the exposure is likely protective from the disease.
Relative Risk
- A relative risk
- GREATER than 1 means the risk is INCREASED
- A relative risk of 1.0 means there is NO association between the risk factor and the disease
- A relative risk LESS
- than 1 means the risk is DECREASED
Example (RR Calculation)
- RR = (43/54) / (3/21) = 0.8 / 0.14 = 5.7
- If RR > 1.0, risk is greater for those exposed & the exposure is more likely the cause of disease.
95% Confidence Interval (CI)
- Is the range of values of the measure of association (RR or OR) that have a 95% chance of containing the true measure of association
- Investigator is 95% “confident” the range contains the true measure of association
- A statistically significant (p-value<0.05) association between exposure & disease will NOT contain a value of 1.0 within the range of values of the 95% CI
- A RR or OR of 1.0 = the null hypothesis, no difference in risk or estimated risk for exposed & unexposed groups, no relationship between exposure & disease
- Either can be a measure of association for a cohort study, but the RR is a direct measure of risk
- Odds Ratio, OR is an estimate of Risk Ratio, OR (covered in more detail in Part 9)
Statistical Significance
- A test of statistical significance determines the likelihood or probability that an association between exposure & disease is due to chance alone
Steps
- Assume the null hypothesis (HO) = exposure & disease are NOT related (the association could be due to chance)
- Alternate hypothesis, HA: exposure & disease are related
- Choose an appropriate statistical test (i.e. Chi Square test) to calculate a test statistic which corresponds to a p (probability) value
- A p-value is selected to serve as a cutoff (significance) point (i.e. commonly 0.05, which represents a 5% likelihood of being wrong in rejecting the null hypothesis)
Steps Cont'd
- If the test statistic corresponds to a p-value > 0.05, then chance alone likely explains the relationship between exposure & disease
- We would reject the HA and accept the HO
- If the test statistic corresponds to a p-value < 0.05, exposure & disease are related, and association is not due to chance
- We would reject the Ho and accept the HA
Chi-Square Calculation
Example: Chi-square test & an air pollution study
*Air pollution study
A random sample of 200 households were selected from each of 2 communities (Community A and B)
A respondent in each household was asked whether or not anyone in the household was bothered by air pollution
HO: proportion bothered by air pollution in Community A is equal proportion bothered by air pollution in Community B
HA: proportion bothered by air pollution in Community A is not equal proportion bothered by air pollution in Community B
p value = 0.05
- Do we reject the null hypothesis? Yes: the exposures are not equal, one has a greater proportion of being bothered by air pollution than the other.