Chi-Squared Test and Cohort Studies

Chi-Squared Test (\chi^2)

Non-parametric test.
Used to analyze categorical data (e.g., cases vs. non-cases).
Commonly uses a row x column (r x c) table to organize data, also known as cross-classification or contingency table (e.g., 2x2 table).
Only valid when frequencies are used in the cells; proportions, means, or physical measurements are not valid.
Detects associations between row and column data but does not indicate the strength of the association.
More accurate with large frequencies.
- Should be at least 1.
- 80% or more should be at least 5.
Produces a \chi^2 statistic and degrees of freedom ((r-1)x(c-1)).
Use the \chi^2 table to determine the p-value.
Used to determine if an association between exposure and disease (shown by measures of association like Risk Ratio or Odds Ratio) is due to chance alone.
If the p-value is < 0.05, the association is unlikely due to chance, suggesting a relationship between disease and exposure.
Chi-square test types vary and are not used when the sample size is < 30. The Fisher exact test is used for samples < 30.

Example

Degree of Freedom: 1
\chi^2 statistic: 5.03
P-value < 0.05
Indicates an association between "X" and "Y".

Analytic Epidemiology Studies: Cohort Study

A group of persons (cohort) without the disease are followed over time.
One subgroup is exposed, and one is not exposed.
Examines if the exposure of interest is associated with the disease.
- Prospective: Data is collected going forward in time.
- Retrospective (historical): At least some data is collected from the past (e.g., foodborne illness cases from a church supper).
The cohort or population at risk is known, thus attack rates can be used to identify the likely risk factor for disease.

*Attack Rate Formula:
Attack Rate = \frac{new cases}{pop. at risk} / given time \text{ at the beginning of time period}

Food Borne Illness

For foodborne illness investigations:

Identify the food with the highest attack rate if eaten.
Identify the food with the lowest attack rate if not eaten.
Identify the food eaten by the most cases.

Example Problem

Calculate Attack Rate for:
- Persons who ate Food A
- Persons who did not eat Food A
What do we discover?
- The attack rate is high among those exposed to Food A
- The attack rate is low among those not exposed to food A
- Most cases (48/50) were exposed to Food A

Church Supper Example

In an outbreak of gastroenteritis following a church supper, attack rates were calculated for those who did and did not eat each of the 14 food items.
The most likely vehicle is vanilla ice cream because it has the highest attack rate (80%) for those who ate it and the lowest for those who did not (14%).

Risk Ratio (RR) or Relative Risk

A measure of association assesses the strength of association between exposure & disease.
The best measure of association for a cohort study is the Risk Ratio (RR) or Relative Risk
Ratio of the incidence rate of a disease or health outcome in an exposed group to the incidence rate of the disease or condition in a non-exposed group

Formula

RR = \frac{\frac{A}{A+B}}{\frac{C}{C+D}}

Where:

A = Exposed and Diseased
B = Exposed and Not Diseased
C = Not Exposed and Diseased
D = Not Exposed and Not Diseased

Interpretation of Risk Ratio

If RR = 1.0, the risk is equal for both those exposed & not exposed; therefore, the exposure is not causing the disease.
If RR > 1.0, the risk is greater for those exposed, and the exposure is more likely the cause of the disease. For example, if RR = 3, then those exposed are 3 times more at risk of disease than those not exposed.
If RR < 1.0, the exposure is likely protective from the disease.

Relative Risk

A relative risk
- GREATER than 1 means the risk is INCREASED
A relative risk of 1.0 means there is NO association between the risk factor and the disease
A relative risk LESS
- than 1 means the risk is DECREASED

Example (RR Calculation)

RR = (43/54) / (3/21) = 0.8 / 0.14 = 5.7
If RR > 1.0, risk is greater for those exposed & the exposure is more likely the cause of disease.

95% Confidence Interval (CI)

Is the range of values of the measure of association (RR or OR) that have a 95% chance of containing the true measure of association
Investigator is 95% “confident” the range contains the true measure of association
A statistically significant (p-value<0.05) association between exposure & disease will NOT contain a value of 1.0 within the range of values of the 95% CI
A RR or OR of 1.0 = the null hypothesis, no difference in risk or estimated risk for exposed & unexposed groups, no relationship between exposure & disease
Either can be a measure of association for a cohort study, but the RR is a direct measure of risk
Odds Ratio, OR is an estimate of Risk Ratio, OR (covered in more detail in Part 9)

Statistical Significance

A test of statistical significance determines the likelihood or probability that an association between exposure & disease is due to chance alone

Steps

Assume the null hypothesis (HO) = exposure & disease are NOT related (the association could be due to chance)
Alternate hypothesis, HA: exposure & disease are related
Choose an appropriate statistical test (i.e. Chi Square test) to calculate a test statistic which corresponds to a p (probability) value
A p-value is selected to serve as a cutoff (significance) point (i.e. commonly 0.05, which represents a 5% likelihood of being wrong in rejecting the null hypothesis)

Steps Cont'd

If the test statistic corresponds to a p-value > 0.05, then chance alone likely explains the relationship between exposure & disease
- We would reject the HA and accept the HO
If the test statistic corresponds to a p-value < 0.05, exposure & disease are related, and association is not due to chance
- We would reject the Ho and accept the HA

Chi-Square Calculation

Example: Chi-square test & an air pollution study

*Air pollution study

A random sample of 200 households were selected from each of 2 communities (Community A and B)

A respondent in each household was asked whether or not anyone in the household was bothered by air pollution
HO: proportion bothered by air pollution in Community A is equal proportion bothered by air pollution in Community B
HA: proportion bothered by air pollution in Community A is not equal proportion bothered by air pollution in Community B
p value = 0.05

Do we reject the null hypothesis? Yes: the exposures are not equal, one has a greater proportion of being bothered by air pollution than the other.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)