Cohort Study Notes

Cohort Study Design

Overview

  • A cohort study is an observational study that follows a group of individuals over time to examine the relationship between an exposure or risk factor and the development of a particular outcome or disease.

Key Features

  • Temporality: Cohort studies can be prospective or retrospective (historical).
    • Prospective: Exposure and outcome are measured after the study begins.
    • Retrospective: Exposure and outcome have already occurred at the beginning of the study; data is collected from historical records.
  • Observational Study: Investigators do not assign exposures.

Types of Studies

  • Descriptive Study: No comparison group.
    • Case study, case series, ecological, cross-sectional
  • Analytical Study: With comparison group.
    • Case-control, cohort

Steps in Conducting Cohort Studies

  1. Identify: Define a cohort and identify individuals free of the outcome of interest.
  2. Measure: Measure exposure(s) of interest.
  3. Follow up: Follow up over time to see who develops the outcome(s) of interest.

Design Considerations

  • How long should the follow-up be?
  • How frequently do you need to measure exposure?
  • Can you ensure that study subjects do not have the disease at the beginning of the study?

Prospective vs. Retrospective Cohort Studies

  • Prospective Cohort Study:
    • Starts with exposed and unexposed individuals.
    • Follows them over time to compare the incidence of the outcome of interest.
    • Exposure and outcome are measured after the study begins.
  • Retrospective Cohort Study:
    • Identifies individuals who have already been exposed in the past and compares them to unexposed individuals.
    • Exposure and outcome have already occurred (or are occurring) at the beginning of the study.
    • Data is collected retrospectively from medical records, historical documents, or other sources.

Examples of Cohort Studies

  • Rancho Bernardo Study of Healthy Aging:
    • Community-based study focused on cardiovascular disease, diabetes, and cognitive function.
    • Established between 1972 to 1974.
    • Enrolled approximately 10,000 adults aged 30 to 79 years (82%).
    • Follow-up interviews every 4 years, with biological measurements and surveys.
    • Annual follow-up for vital status via mail or phone (death certificates and cause of death).
    • More than 450 studies have been published based on this cohort.
  • Dutch Famine Study – Hongerwinter (Retrospective):
    • Occurred from October 1944 to May 1945 in the Netherlands.
    • Studied the impact of acute maternal malnutrition on gestation and health (cardiovascular and metabolic disease).
    • Harsh winters coupled with WW2, bad crops, and embargo on food transport.
    • 4.5 million people affected by famine in a country of 9 million, forced to live on rations of 400 to 800 calories per day.
    • 22,000 deaths.
    • Consisted of 2,414 babies born alive in Amsterdam.
    • Cohort was traced and studied since 1994, with repeated measures.
    • Blood, urine, buccal swabs were collected, and functional testing of heart, lungs, and kidney was performed.
  • Efficacy of semester-dependent mRNA vaccination on anti-SARS-CoV-2 antibody response:
    • Compared Moderna and Pfizer vaccines.
    • Sample collection timepoints: Baseline (PD0), Days post dose 1 (PD1), Days post dose 2 (PD2).

Advantages of Cohort Studies

  • Risk factors (exposure) known before disease.
  • Can calculate incidence and relative risk.
  • Multiple outcomes can be examined.
  • If the cohort is not selected based on a specific exposure (e.g., entire town followed), then multiple exposures can also be studied.
  • Well-suited for studying exposures that might be rare in the general population (e.g., occupational hazards).
  • Easier to explain to the lay public than some other designs.

Disadvantages of Cohort Studies

  • Prospective studies often long and expensive.
  • Quality and completeness of exposure data may be imperfect, especially for retrospective cohort studies.
  • Changes in diagnostic criteria or methods over time can complicate analysis.
  • A large number of subjects is required if the outcome is uncommon.
  • Not good for “rare” diseases (possibly nothing to analyze).
  • Administrative challenges: loss of staff, funding, high costs.

Loss to Follow-Up

  • Especially a problem with long follow-up time.
  • Threatens validity.
    • Non-differential (across exposure groups): Random loss, in theory, not a big problem.
    • Differential (across exposure groups): Potential for bias.

Measures of Risk and Impact

  • Relative Risk (RR)
  • Risk Ratio
  • Rate Ratio
  • Risk Differences
  • Attributable Risk (AR) / AR percent (AR%)
  • Population Attributable Risk (PAR) / PAR percent (PAR%)

Relative Risk (RR) / Risk Ratio

  • Measures the strength of the association.
  • The larger the relative risk, the stronger the association between the risk factor and the outcome.
  • During a cohort study, we are observing for the development of an outcome/disease.
  • Cumulative Incidence (CI):
    • CIexposed=AA+BCI_{exposed} = \frac{A}{A + B}
    • CIunexposed=CC+DCI_{unexposed} = \frac{C}{C + D}
  • Risk Ratio (Relative Risk):
    • Risk Ratio=CI<em>exposedCI</em>unexposedRisk \ Ratio = \frac{CI<em>{exposed}}{CI</em>{unexposed}}
  • Incidence Rate:
    • Incidence Rateexposed=aptime (exposed)Incidence \ Rate_{exposed} = \frac{a}{p-time \ (exposed)}
    • Incidence Rateunexposed=cptime (unexposed)Incidence \ Rate_{unexposed} = \frac{c}{p-time \ (unexposed)}
  • Rate Ratio:
    • Rate Ratio=Incidence Rate<em>exposedIncidence Rate</em>unexposedRate \ Ratio = \frac{Incidence \ Rate<em>{exposed}}{Incidence \ Rate</em>{unexposed}}
Interpretation of Relative Risk
  • RR = 1: No or negligible difference in risk.
    • Incidence in each group is the same.
    • No apparent association between the exposure and disease.
  • RR >> 1: A positive association between the exposure and the disease.
    • Suggests increased risk of the outcome in the exposed group.
    • The exposure might be a cause of the disease.
  • RR << 1: A negative association between the exposure and the disease.
    • Suggests reduced risk of the outcome in the exposed group.
    • Exposure might be protective against the disease.
Example Calculation
    |             | Lung Cancer? |          |       |
    | :---------- | :----------- | :------- | :---- |
    |             | yes          | no       |       |
    | Smokers     | 4            | 16       | 20    |
    | Non-Smokers | 2            | 18       | 20    |
    |             | 6            | 34       | 40    |
  • Incidence of disease among smokers: 4/20
  • Incidence of disease among non-smokers: 2/20
  • Relative Risk (Risk Ratio) of lung cancer in smokers (vs non-smokers):
    • RR=420220=2.0RR = \frac{\frac{4}{20}}{\frac{2}{20}} = 2.0
    • Smokers were 2 times as likely to get sick compared to non-smokers.
    • Smokers had 2 times the risk of disease compared to non-smokers.

Risk Differences (RD) or Attributable Risk (AR)

  • If something is attributable to an event, situation, or person, it is likely that it was caused by that event, situation, or person.
  • Example: 10,000 deaths per year from chronic lung disease are attributable to smoking.
  • Answers the question: What is the incidence of disease in the exposed portion (AR) or the total population (PAR) that is due to the exposure?
  • It is the incidence of a disease in the exposed (AR) or the total population (PAR) if the exposure was eliminated.
  • AR addresses excess risk in the exposed due to the exposure.
    • AR=CI<em>exposedCI</em>unexposedAR = CI<em>{exposed} – CI</em>{unexposed}
  • AR%=CI<em>exposedCI</em>unexposedCIexposed×100AR\% = \frac{CI<em>{exposed} – CI</em>{unexposed}}{CI_{exposed}} \times 100
  • PAR addresses excess risk in the population due to the exposure.
    • PAR=CI<em>total pop.CI</em>unexposedPAR = CI<em>{total \ pop.} – CI</em>{unexposed}
  • PAR%=CI<em>total pop.CI</em>unexposedCItotal pop.×100PAR\% = \frac{CI<em>{total \ pop.} – CI</em>{unexposed}}{CI_{total \ pop.}} \times 100

Relative Risk vs. Attributable Risk

  • Relative Risk: Oral contraceptives are associated with a two-fold higher risk of heart attacks. Strength of magnitude. A measurement of association that may prove causality.
  • Attributable Risk: Oral contraceptives increase the risk of heart attacks by 2 per million women per year. Public health impact.

Examples

  • Relative risk calculation may identify that screen time is a major risk factor that leads to suicide in teenage girls.
  • Risk difference (AR% or PAR%) can estimate the % reduction of suicide in teenage girls if we were to eliminate screen time at an early age.

Population Attributable Risk % (PAR%)

  • Allows us to determine the percentage of the outcome that can be eliminated if we remove the said exposure.

Study Design Summary

ObservationalExperimental
Type of StudyCross-SectionalInterventional
Case-Control
Cohort
Recruitment based onExposure/Outcome
TimelinePrevalence
Incidence
Measure of Association
Identify Causality
Measure of Impact (Risk Difference)

Example: Low Birth Weight and Smoking

In a particular year, there were 1000 births. 72 had low birthweights, and 158 had mothers who smoked during pregnancy. Of the mothers who smoked, 19 gave birth to low-birth-weight babies.

Table:
Lo Birth Wght YesLo Birth Wght No
Smoked during Pregnancy19139158
Did not smoke53789842
729281000
Calculations:
  • Risk of low birth weight in smokers: 19/158 = 12%
  • Risk of low birth weight in non-smokers: 53/842 = 6%
  • Relative Risk: 12% / 6% = 1.9
Interpretation:
  • Women who smoke while pregnant are about twice as likely to give birth to low weight babies compared to those who do not smoke.
  • Population Attributable Risk: (72/1000 - 53/842) = 9 per 1,000 births.
  • The overall risk of low birth weight for the total population by smoking is about 9 per 1000 births.
  • Population Attributable Risk %: (72/1000 - 53/842) / (72/1000) = 12.5%
    Of the 72 low birthweight cases, including those born to both smoking and non-smoking mothers, 9 cases or 12.5% can be attributed to smoking. This calculation helps estimate the percent of cases in the total population that might be prevented by removing the exposure.