Principles of Epidemiology in Public Health Practice - Complete Course Notes

Definition and Principles of Epidemiology

  • Etymology: The term is derived from the Greek words epi (on or upon), demos (people), and logos (the study of), translating to "the study of what befalls a population."
  • Formal Definition: Epidemiology is the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems.
  • Key Components of the Definition:
    • Study: Epidemiology is a scientific, data-driven discipline relying on systematic and unbiased approaches to collection, analysis, and interpretation of data. It employs causal reasoning based on hypotheses from biology, behavioral science, physics, and ergonomics.
    • Distribution: Refers to frequency (number of events and their relationship to population size via rates) and pattern (occurrence by time, place, and person).
    • Determinants: Factors (causes, risk factors) that bring about a change in a health condition. It assumes illness is not random but occurs when the right accumulation of factors exists in an individual.
    • Health-related states or events: Originally focused on communicable disease epidemics; now includes chronic diseases, injuries, birth defects, occupational health, environmental health, and behaviors (e.g., exercise, seat belt use).
    • Specified populations: Clinicians treat individuals (the patient); epidemiologists treat the community (the collective health of people).
    • Application: Epidemiology is both science and art, using scientific methods to "diagnose" the community's health and propose practical interventions.

Historical Evolution of Epidemiology

  • Hippocrates (Circa 400 B.C.): In the essay "On Airs, Waters, and Places," he suggested environmental and host behavioral factors influence disease, moving away from supernatural explanations.
  • John Graunt (1662): A London haberdasher who first quantified patterns of birth, death, and disease, noting disparities by sex, infant mortality, and urban/rural differences.
  • William Farr (1800s): The father of modern vital statistics and surveillance. He systematically collected and evaluated Britain's mortality statistics, reporting findings to authorities and the public.
  • John Snow (Mid-1800s): The father of field epidemiology. His 1854 investigation of the Golden Square cholera outbreak in London used a "spot map" to link deaths to the Broad Street pump. His second investigation compared mortality between the Lambeth Company (intake upstream from sewage) and the Southwark and Vauxhall Company (intake downstream).
    • Southwark & Vauxhall Mortality: 5.0 per 1,000 population5.0 \text{ per 1,000 population}.
    • Lambeth Mortality: 0.9 per 1,000 population0.9 \text{ per 1,000 population}.
  • Post-World War II Developments:
    • Application to chronic disease (e.g., Doll and Hill smoking/lung cancer studies; Framingham Heart Study).
    • Smallpox eradication (1960s-1970s).
    • Inclusion of injuries, violence, and molecular/genetic epidemiology (1980s-1990s).
    • Focus on bioterrorism and biologic warfare (Post-September 11, 2001).

Core Epidemiologic Functions

  • Public Health Surveillance: Ongoing systematic collection, analysis, interpretation, and dissemination of health data ("information for action").
  • Field Investigation: Responding to reports of cases or clusters to identify causes and prevent further spread ("shoe leather epidemiology").
  • Analytic Studies: Using comparison groups to evaluate hypotheses. Components include design (calculating sample sizes), conduct (ethical protocols), analysis (testing for significance), and interpretation.
  • Evaluation: Assessing the relevance, effectiveness, efficiency, and impact of health services.
    • Effectiveness: Ability of a program to produce results in the field.
    • Efficacy: Ability to produce results under ideal conditions.
    • Efficiency: Producing results with minimum expenditure of time and resources.
  • Linkages: Working in multidisciplinary teams; field epidemiology is a "team sport."
  • Policy Development: Providing recommendations based on findings to direct appropriate public health interventions.

The Epidemiologic Approach and Case Definitions

  • Primary Tasks: Count (cases), Divide (by denominators to find rates), and Compare (rates over time or between groups).
  • Case Definition: A set of standard criteria for classifying whether a person has a specific disease or condition.
    • Clinical Criteria: Confirmatory lab tests, combinations of symptoms (subjective), and signs (objective).
    • Time/Place/Person: Standardized limits reflecting the scope of an outbreak (e.g., "Resident of Winston-Salem with onset between October and January").
    • Sensitivity vs. Specificity:
      • Sensitive (Loose): Used for rare/severe diseases to capture every possible case (e.g., rubella defined as "any generalized rash illness").
      • Specific (Strict): Used in analytic studies to ensure participants truly have the disease (e.g., requiring positive lab culture for Salmonella).
    • Categories of Certainty: Confirmed (lab evidence), Probable (typical clinical features), Suspected (fewer features).

Summarizing Data: Variables and Frequency

  • Nominal-scale: Categories without numerical ranking (e.g., county of residence, male/female). Categories are qualitative.
  • Ordinal-scale: Categories that can be ranked but not evenly spaced (e.g., Stage I-IV cancer).
  • Interval-scale: Measured in equally spaced units without a true zero (e.g., date of birth).
  • Ratio-scale: Interval variable with a true zero point (e.g., height, duration of illness, induration in millimeters).
  • Frequency Distribution: Displays the values a variable can take and the number of persons with each value.

Measures of Central Location

  • Arithmetic Mean: The average of all values, often called the "center of gravity."
    • Formula: Mean=xin\text{Mean} = \frac{\sum x_i}{n}
    • Centering Property: (ximean)=0\sum (x_i - \text{mean}) = 0
  • Median: The middle value of a set of data in rank order (50th percentile50\text{th percentile}).
    • Position Formula: n+12\frac{n+1}{2}
  • Mode: The value that occurs most often. Distributions can be bimodal (two peaks) or have no mode.
  • Midrange: The midpoint of a set of observations.
    • Standard Formula: Minimum+Maximum2\frac{\text{Minimum} + \text{Maximum}}{2}
    • Age Formula: Minimum+Maximum+12\frac{\text{Minimum} + \text{Maximum} + 1}{2}
  • Geometric Mean: The mean of data measured on a logarithmic scale. Used for serial dilutions or assays.
    • Method A: Antilog[log(xi)n]\text{Antilog} [ \frac{\sum \log(x_i)}{n} ]

Measures of Spread

  • Range: The difference between the maximum and minimum values. Epidemiologists often report it as "from [min] to [max]."
  • Interquartile Range (IQR): The central portion of the distribution (25th25\text{th} to 75th75\text{th} percentile).
    • Q1 position=n+14Q_1 \text{ position} = \frac{n+1}{4}
    • Q3 position=3(n+1)4Q_3 \text{ position} = \frac{3(n+1)}{4}
  • Standard Deviation (SDSD): Measures how widely observations are distributed around the arithmetic mean.
    • Variance (s2s^2): (xixˉ)2n1\frac{\sum (x_i - \bar{x})^2}{n-1}
    • SDSD: s2\sqrt{s^2}
  • Standard Error of the Mean (SEMSEM): Refers to variability in means of repeated samples.
    • Formula: SDn\frac{SD}{\sqrt{n}}
  • Confidence Interval (CICI): A range of values consistent with data, indicating the precision of an estimate.
    • 95%CI95\% CI for Mean: Mean±(1.96×SEM)\text{Mean} \pm (1.96 \times SEM)

Morbidity Frequency Measures

  • Incidence Proportion (Attack Rate/Risk): Proportion of an initially disease-free population that develops disease over a specific period.
    • Formula: New cases identified during periodPopulation at start of period×10n\frac{\text{New cases identified during period}}{\text{Population at start of period}} \times 10^n
  • Secondary Attack Rate: Measure of transmission within a closed group (e.g., household).
    • Formula: Cases among contacts of primary casesTotal number of contacts×100%\frac{\text{Cases among contacts of primary cases}}{\text{Total number of contacts}} \times 100\%
  • Incidence Rate (Person-time Rate): Incorporates time directly into the denominator.
    • Formula: New cases during specified periodTime each person was observed×10n\frac{\text{New cases during specified period}}{\sum \text{Time each person was observed}} \times 10^n
  • Prevalence: Proportion of persons in a population who have a disease at a specific time (Point Prevalence) or over a period (Period Prevalence). Includes both new and pre-existing cases.
    • Formula: All new and pre-existing casesPopulation during same period×10n\frac{\text{All new and pre-existing cases}}{\text{Population during same period}} \times 10^n

Mortality Frequency Measures

  • Crude Death Rate: Total deaths during periodMid-interval population×10n\frac{\text{Total deaths during period}}{\text{Mid-interval population}} \times 10^n
  • Cause-Specific Mortality: Deaths from specific causeMid-interval population×100,000\frac{\text{Deaths from specific cause}}{\text{Mid-interval population}} \times 100,000
  • Age-Specific Mortality: Deaths in specific age groupPopulation in same age group×10n\frac{\text{Deaths in specific age group}}{\text{Population in same age group}} \times 10^n
  • Infant Mortality Rate: Deaths among children < 1 yearNumber of live births reported×1,000\frac{\text{Deaths among children < 1 year}}{\text{Number of live births reported}} \times 1,000
  • Neonatal Mortality: Deaths from birth to day 27.
  • Postneonatal Mortality: Deaths from day 28 up to 1 year.
  • Maternal Mortality Rate: Pregnancy-related deathsNumber of live births×100,000\frac{\text{Pregnancy-related deaths}}{\text{Number of live births}} \times 100,000
  • Case-Fatality Rate: Cause-specific deaths among incident casesNumber of incident cases×100%\frac{\text{Cause-specific deaths among incident cases}}{\text{Number of incident cases}} \times 100\%
  • Proportionate Mortality: Deaths from a particular causeTotal deaths from all causes×100%\frac{\text{Deaths from a particular cause}}{\text{Total deaths from all causes}} \times 100\%
  • Years of Potential Life Lost (YPLL): A measure of premature mortality.
    • Individual YPLL: Endpoint (e.g., 65)Age at Death\text{Endpoint (e.g., 65)} - \text{Age at Death}

Measures of Association and Impact

  • Risk Ratio (Relative Risk - RRRR): Compares risk in the exposed group to risk in the unexposed group.
    • Formula: Risk (Exposed)Risk (Unexposed)\frac{\text{Risk (Exposed)}}{\text{Risk (Unexposed)}}
  • Odds Ratio (OROR): Used in case-control studies where the population size is unknown.
    • Formula (Cross-product): adbc\frac{ad}{bc}
  • Attributable Proportion: Percent of disease among the exposed that is due to the exposure.
    • Formula: Risk (Exposed)Risk (Unexposed)Risk (Exposed)×100%\frac{\text{Risk (Exposed)} - \text{Risk (Unexposed)}}{\text{Risk (Exposed)}} \times 100\%
  • Vaccine Efficacy (VEVE): Risk (Unvaccinated)Risk (Vaccinated)Risk (Unvaccinated)\frac{\text{Risk (Unvaccinated)} - \text{Risk (Vaccinated)}}{\text{Risk (Unvaccinated)}}, or 1RR1 - RR.

Displaying Public Health Data

  • Tables: Self-explanatory displays with descriptive titles (What,Where,WhenWhat, Where, When).
    • One-variable table: Simple frequency distribution.
    • Two-variable table: Contingency table (e.g., 2×22 \times 2).
  • Line Graphs: Arithmetic scale for trends; Semilogarithmic scale for wide ranges (orders of magnitude) or rates of change.
  • Histograms: Used for continuous variables; the epidemic curve (epi curve) is a histogram of cases by onset time.
  • Frequency Polygons: Uses midpoints of intervals connected by lines; area under the curve is representative of data.
  • Bar Charts: For discrete categories.
    • Grouped: Adjoining bars comparing categories across series.
    • Stacked: Subdivided single bars showing component parts.
    • 100%Component100\% Component: All bars same height to show percent distribution.
  • Specialized Displays:
    • Scatter Diagram: Portrays relationship between two continuous variables.
    • Population Pyramid: Two histograms (male/female) turned sideways to show age distribution.
    • Box Plot: Shows median, interquartile range (box), and full range (whiskers).
    • Phylogenetic Tree: Shows genetic relatedness of organisms.
    • Decision Tree: Logical sequence of choices (squares) and chance outcomes (circles).
  • Maps: Spot maps pinpoint individual cases; Area (choropleth) maps use shading to show rates.

Public Health Surveillance

  • Definition: Continued watchfulness over the distribution and trends of incidence via collection, evaluation, and dissemination of reports.
  • Types of Surveillance:
    • Passive: Provider-initiated; health-care providers send reports based on rules.
    • Active: Health-department initiated; staff contact providers to solicit reports (e.g., during outbreaks).
    • Sentinel: Relies on a prearranged sample of dedicated providers.
    • Syndromic: Monitors real-time clinical features (e.g., "respiratory disease with fever") to detect bioterrorism or outbreaks before diagnosis.
  • NNDSS: National Notifiable Disease Surveillance System. Reporting to state is mandatory by law, but reporting to CDC is voluntary.
  • Evaluation Attributes: Simplicity, Flexibility, Quality, Acceptability, Sensitivity, Predictive Value Positive (PVPPVP), Representativeness, Stability, and Timeliness.
    • PVP=True PositivesTrue Positives+False PositivesPVP = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}

Investigating an Outbreak (The 13 Steps)

  1. Prepare for field work: Gather scientific knowledge, supplies, and formal agreements.
  2. Establish existence: Verify if the number of cases exceeds the expected level.
  3. Verify diagnosis: Review clinical and lab findings; talk to patients.
  4. Construct case definition: Set clinical, time, place, and person criteria.
  5. Find cases systematically: Use active or stimulated passive surveillance; record on a line listing.
  6. Perform descriptive epidemiology: Analyze by time (epi curve), place (spot map), and person.
  7. Develop hypotheses: Consider reservoir, mode of transmission, and usual candidates.
  8. Evaluate hypotheses epidemiologically: Formal testing using cohort or case-control studies.
  9. Reconsider/Refine hypotheses: If the first study is unrevealing (e.g., marijuana as a salmonella source).
  10. Compare/Reconcile with lab/environmental studies: Use environmental swabs or genetic sequencing.
  11. Implement control and prevention: Target the segment in the chain most susceptible to intervention.
  12. Initiate/Maintain surveillance: Monitor to ensure interventions worked.
  13. Communicate findings: Oral briefings and formal written reports.