Cohort Study Notes

Cohort Study Design

Overview

A cohort study is an observational study that follows a group of individuals over time to examine the relationship between an exposure or risk factor and the development of a particular outcome or disease.

Key Features

Temporality: Cohort studies can be prospective or retrospective (historical).
- Prospective: Exposure and outcome are measured after the study begins.
- Retrospective: Exposure and outcome have already occurred at the beginning of the study; data is collected from historical records.
Observational Study: Investigators do not assign exposures.

Types of Studies

Descriptive Study: No comparison group.
- Case study, case series, ecological, cross-sectional
Analytical Study: With comparison group.
- Case-control, cohort

Steps in Conducting Cohort Studies

Identify: Define a cohort and identify individuals free of the outcome of interest.
Measure: Measure exposure(s) of interest.
Follow up: Follow up over time to see who develops the outcome(s) of interest.

Design Considerations

How long should the follow-up be?
How frequently do you need to measure exposure?
Can you ensure that study subjects do not have the disease at the beginning of the study?

Prospective vs. Retrospective Cohort Studies

Prospective Cohort Study:
- Starts with exposed and unexposed individuals.
- Follows them over time to compare the incidence of the outcome of interest.
- Exposure and outcome are measured after the study begins.
Retrospective Cohort Study:
- Identifies individuals who have already been exposed in the past and compares them to unexposed individuals.
- Exposure and outcome have already occurred (or are occurring) at the beginning of the study.
- Data is collected retrospectively from medical records, historical documents, or other sources.

Examples of Cohort Studies

Rancho Bernardo Study of Healthy Aging:
- Community-based study focused on cardiovascular disease, diabetes, and cognitive function.
- Established between 1972 to 1974.
- Enrolled approximately 10,000 adults aged 30 to 79 years (82%).
- Follow-up interviews every 4 years, with biological measurements and surveys.
- Annual follow-up for vital status via mail or phone (death certificates and cause of death).
- More than 450 studies have been published based on this cohort.
Dutch Famine Study – Hongerwinter (Retrospective):
- Occurred from October 1944 to May 1945 in the Netherlands.
- Studied the impact of acute maternal malnutrition on gestation and health (cardiovascular and metabolic disease).
- Harsh winters coupled with WW2, bad crops, and embargo on food transport.
- 4.5 million people affected by famine in a country of 9 million, forced to live on rations of 400 to 800 calories per day.
- 22,000 deaths.
- Consisted of 2,414 babies born alive in Amsterdam.
- Cohort was traced and studied since 1994, with repeated measures.
- Blood, urine, buccal swabs were collected, and functional testing of heart, lungs, and kidney was performed.
Efficacy of semester-dependent mRNA vaccination on anti-SARS-CoV-2 antibody response:
- Compared Moderna and Pfizer vaccines.
- Sample collection timepoints: Baseline (PD0), Days post dose 1 (PD1), Days post dose 2 (PD2).

Advantages of Cohort Studies

Risk factors (exposure) known before disease.
Can calculate incidence and relative risk.
Multiple outcomes can be examined.
If the cohort is not selected based on a specific exposure (e.g., entire town followed), then multiple exposures can also be studied.
Well-suited for studying exposures that might be rare in the general population (e.g., occupational hazards).
Easier to explain to the lay public than some other designs.

Disadvantages of Cohort Studies

Prospective studies often long and expensive.
Quality and completeness of exposure data may be imperfect, especially for retrospective cohort studies.
Changes in diagnostic criteria or methods over time can complicate analysis.
A large number of subjects is required if the outcome is uncommon.
Not good for “rare” diseases (possibly nothing to analyze).
Administrative challenges: loss of staff, funding, high costs.

Loss to Follow-Up

Especially a problem with long follow-up time.
Threatens validity.
- Non-differential (across exposure groups): Random loss, in theory, not a big problem.
- Differential (across exposure groups): Potential for bias.

Measures of Risk and Impact

Relative Risk (RR)
Risk Ratio
Rate Ratio
Risk Differences
Attributable Risk (AR) / AR percent (AR%)
Population Attributable Risk (PAR) / PAR percent (PAR%)

Relative Risk (RR) / Risk Ratio

Measures the strength of the association.
The larger the relative risk, the stronger the association between the risk factor and the outcome.
During a cohort study, we are observing for the development of an outcome/disease.
Cumulative Incidence (CI):
- $CI_{exposed} = \frac{A}{A + B}$
- $CI_{unexposed} = \frac{C}{C + D}$
Risk Ratio (Relative Risk):
- $Risk \ Ratio = \frac{CI{exposed}}{CI{unexposed}}$
Incidence Rate:
- $Incidence \ Rate_{exposed} = \frac{a}{p-time \ (exposed)}$
- $Incidence \ Rate_{unexposed} = \frac{c}{p-time \ (unexposed)}$
Rate Ratio:
- $Rate \ Ratio = \frac{Incidence \ Rate{exposed}}{Incidence \ Rate{unexposed}}$

Interpretation of Relative Risk

RR = 1: No or negligible difference in risk.
- Incidence in each group is the same.
- No apparent association between the exposure and disease.
RR >> 1: A positive association between the exposure and the disease.
- Suggests increased risk of the outcome in the exposed group.
- The exposure might be a cause of the disease.
RR << 1: A negative association between the exposure and the disease.
- Suggests reduced risk of the outcome in the exposed group.
- Exposure might be protective against the disease.

Example Calculation

    |             | Lung Cancer? |          |       |
    | :---------- | :----------- | :------- | :---- |
    |             | yes          | no       |       |
    | Smokers     | 4            | 16       | 20    |
    | Non-Smokers | 2            | 18       | 20    |
    |             | 6            | 34       | 40    |

Incidence of disease among smokers: 4/20
Incidence of disease among non-smokers: 2/20
Relative Risk (Risk Ratio) of lung cancer in smokers (vs non-smokers):
- $RR = \frac{\frac{4}{20}}{\frac{2}{20}} = 2.0$
- Smokers were 2 times as likely to get sick compared to non-smokers.
- Smokers had 2 times the risk of disease compared to non-smokers.

Risk Differences (RD) or Attributable Risk (AR)

If something is attributable to an event, situation, or person, it is likely that it was caused by that event, situation, or person.
Example: 10,000 deaths per year from chronic lung disease are attributable to smoking.
Answers the question: What is the incidence of disease in the exposed portion (AR) or the total population (PAR) that is due to the exposure?
It is the incidence of a disease in the exposed (AR) or the total population (PAR) if the exposure was eliminated.
AR addresses excess risk in the exposed due to the exposure.
- $AR = CI{exposed} – CI{unexposed}$
$AR\% = \frac{CI{exposed} – CI{unexposed}}{CI_{exposed}} \times 100$
PAR addresses excess risk in the population due to the exposure.
- $PAR = CI{total \ pop.} – CI{unexposed}$
$PAR\% = \frac{CI{total \ pop.} – CI{unexposed}}{CI_{total \ pop.}} \times 100$

Relative Risk vs. Attributable Risk

Relative Risk: Oral contraceptives are associated with a two-fold higher risk of heart attacks. Strength of magnitude. A measurement of association that may prove causality.
Attributable Risk: Oral contraceptives increase the risk of heart attacks by 2 per million women per year. Public health impact.

Examples

Relative risk calculation may identify that screen time is a major risk factor that leads to suicide in teenage girls.
Risk difference (AR% or PAR%) can estimate the % reduction of suicide in teenage girls if we were to eliminate screen time at an early age.

Population Attributable Risk % (PAR%)

Allows us to determine the percentage of the outcome that can be eliminated if we remove the said exposure.

Study Design Summary

	Observational	Experimental
Type of Study	Cross-Sectional	Interventional
	Case-Control
	Cohort
Recruitment based on	Exposure/Outcome
Timeline	Prevalence
	Incidence
Measure of Association
Identify Causality
Measure of Impact (Risk Difference)

Example: Low Birth Weight and Smoking

In a particular year, there were 1000 births. 72 had low birthweights, and 158 had mothers who smoked during pregnancy. Of the mothers who smoked, 19 gave birth to low-birth-weight babies.

Table:

	Lo Birth Wght Yes	Lo Birth Wght No
Smoked during Pregnancy	19	139	158
Did not smoke	53	789	842
	72	928	1000

Calculations:

Risk of low birth weight in smokers: 19/158 = 12%
Risk of low birth weight in non-smokers: 53/842 = 6%
Relative Risk: 12% / 6% = 1.9

Interpretation:

Women who smoke while pregnant are about twice as likely to give birth to low weight babies compared to those who do not smoke.
Population Attributable Risk: (72/1000 - 53/842) = 9 per 1,000 births.
The overall risk of low birth weight for the total population by smoking is about 9 per 1000 births.
Population Attributable Risk %: (72/1000 - 53/842) / (72/1000) = 12.5%
Of the 72 low birthweight cases, including those born to both smoking and non-smoking mothers, 9 cases or 12.5% can be attributed to smoking. This calculation helps estimate the percent of cases in the total population that might be prevented by removing the exposure.