Statistical Reasoning Lecture 5

Public Health Statistics: Dealing with the Element of Time

Overview

Presented by John McGready, PhD from Johns Hopkins University.
Aimed at understanding statistics in public health concerning time-to-event analyses and spatial data.
Emphasizes the importance of correctly handling time components in studies.

Learning Objectives

Spatial Data:
- Summarize event counts as event rates during defined observational periods.
- Standardize person-count event counts to event rates.
Time-to-Event Outcomes (Individual Outcome Times Known):
- Distinguish between calendar time and study time scales.
- Define censoring in the context of time-to-event studies.
- Explain problems with ignoring time components or averaging follow-up times.
- Compute event incidence rates using event counts and cumulative follow-up times.

Event Rates for Data Without Known Event Times

For some datasets, event times are not recorded but grouped into intervals.
- Common in aggregated death or disease rates by geographical area.

Example: Lung Cancer Cases in Pennsylvania (2002 Data)

Data regarding incidences of lung cancer diagnoses in Pennsylvania, stratified by:
- County
- Sex
- Race
- Age Groups
Summarization method:
- Incidence Rate Calculation: Total Cases / Total Person-Time at Risk.

Incidence Rate Calculation for Lung Cancer in Pennsylvania

Data from 2002:
- Lung Cancer Diagnoses: 10,279 cases.
- Population: 12,281,054 residents.
- Incidence Rate ( $\bf I$ ):
  $I = \frac{10,279 \text{ cases}}{12,281,054 \text{ person-years}} \approx 0.0008 \text{ cases per person-year}$
Rescaled Incidence Rate: Events per 10,000 Years:
- $I = 0.0008 \times 10,000 = 8 \text{ cases per 10,000 person-years}$

Understanding Incidence Rates

An incidence rate captures the occurrence of events per unit of time. It is distinct from a simple proportion.
- Percent Interpretation: 0.08% incidence of new cases in 2002 implies that this fraction of the studied population developed lung cancer.
Rates tend to be lower as proportions and exhibit different statistical properties than standard proportions.

Time-to-Event Data With Known Event Times

Individual event times known and incorporated into incidence rate calculations, especially in longitudinal cohort studies.

Example: Primary Biliary Cirrhosis (PBC) Trial

Randomized trial capitalizing on measured death occurrences during follow-up.
Research focused on the effectiveness of D-penicillamine (DPCA).

Subject Illustrations in PBC Trial

Complete Observation: Subject who dies after 7 years in the study.
Censored Observations:
- Subject lost to follow-up after 2 years but alive.
- Subject remained till study completion without events.

Summarizing Time-to-Event Data

Numeric Summarizations:
- Option A: Binary treatment of death. Proportion of subjects who died:
  $\hat{p} = \frac{1}{3} \approx 0.33$
- Issue: It equates time at risk across participants, ignoring variance.
- Option B: Average follow-up time.
  $\bar{x} = \frac{7+2+3}{3} = 4 \text{ years}$
- Misleading as it does not reflect time till death but average follow time.
- Option C: Incidence rate reflecting deaths per total follow-up.
- Construction:
  $I = \frac{1 \text{ death}}{7 + 3 + 2 \text{ years}} \approx 0.083 \text{ deaths/year}$

Example: ART and HIV Transmission

Cohen et al. (2011) discussed early antiretroviral therapy among discordant couples in diverse global locations.

Importance of Time in Comparisons

Follow-up time impacts comparisons across groups. Ignoring may skew results.
Demonstrated in lung cancer incidence rates between males and females in Pennsylvania:
- Females: $I_F = \frac{4,587}{6,351,391} \approx 0.00072$ cases/PY.
- Males: $I_M = \frac{5,692}{5,929,663} \approx 0.00096$ cases/PY.
- Incidence Rate Ratio (IRR):
  $IRR = \frac{I<em>F}{I</em>M} \approx 0.75$
Females had a 25% lower risk compared to males.

Kaplan-Meier Curves for Time-to-Event Data

A graphical method to display time-to-event data effectively.
- S(t) signifies survival proportions remaining event-free beyond time t.

Survival Curve Properties

Kaplan-Meier curves summarize survival data, accounting for censoring events and providing estimates of survival probabilities over time.
- Use of censoring optimally meets the requirement to retain all data in a sample.
Event Time Percentiles: Can be estimated via Kaplan-Meier curves based on empirical data.
- Different visual presentations (1-S(t)) can reflect cumulative death probabilities.

Conclusion and Summary

The document captures the essence of public health statistics tailored toward time-to-event analysis and its implications in interpreting health outcomes.
Employing targeted populations and events, it emphasizes the importance of using correct statistical methods in summarizing health statistics.

Repeated Examples of Different Contexts

Further reinforced by comparisons of HIV incidence rates amongst various gender groups and detailed analyses of other public health metrics reflecting the need for precision in statistical reporting for public and health policy.

Additional Notes

Importance of using censoring, follow-up time, and proper contextual grouping while analyzing incidence rates is reinforced across multiple examples.

Final Thoughts

Continual integration of statistical analyses with real-world applications in public health to drive evidence-based decisions effectively.

Key Equations and Explanations

Incidence Rate ( $I$ ) for Lung Cancer in Pennsylvania:
$I = \frac{\text{Total Cases}}{\text{Total Person-Time at Risk}}$
- Explanation: This formula calculates the rate at which new cases of a disease (e.g., lung cancer) occur in a population over a specific period. It divides the total number of new cases by the sum of time each individual in the population was at risk, often expressed in person-years.
Rescaled Incidence Rate:
$I = \text{Calculated Incidence Rate} \times \text{Scaling Factor}$
- Explanation: This shows how an incidence rate can be rescaled for easier interpretation, for example, from cases per person-year to cases per 10,000 person-years.
Proportion of Subjects Who Died ( $\hat{p}$ ) (Option A for Summarizing Time-to-Event Data):
$\hat{p} = \frac{\text{Number of Events (Deaths)}}{\text{Total Number of Subjects}}$
- Explanation: This is a simple proportion representing the fraction of individuals who experienced an event (like death) within a study. It treats death as a binary outcome and does not account for the time individuals were at risk or observed.
Average Follow-up Time ( $\bar{x}$ ) (Option B for Summarizing Time-to-Event Data):
$\bar{x} = \frac{\sum \text{Individual Follow-up Times}}{\text{Total Number of Subjects}}$
- Explanation: This calculates the mean duration for which subjects were observed in a study. It averages the time participants spent in the study but can be misleading as it doesn't directly reflect the time until an event occurs or account for censoring.
Incidence Rate (for PBC Trial) (Option C for Summarizing Time-to-Event Data):
$I = \frac{\text{Number of Events (Deaths)}}{\text{Total Cumulative Follow-up Time}}$
- Explanation: This method calculates the incidence rate by dividing the total number of events (e.g., deaths) by the sum of all individual follow-up times (person-years), providing a rate of events per unit of observation time.
Female Lung Cancer Incidence Rate ( $I_F$ ) (Example):
$I_F = \frac{\text{Female Lung Cancer Cases}}{\text{Female Person-Years at Risk}}$
- Explanation: This represents the incidence rate of lung cancer specifically within the female population, calculated as the number of new female lung cancer cases divided by the total person-years females were at risk.
Male Lung Cancer Incidence Rate ( $I_M$ ) (Example):
$I_M = \frac{\text{Male Lung Cancer Cases}}{\text{Male Person-Years at Risk}}$
- Explanation: Similar to the female incidence rate, this calculates the incidence rate of lung cancer within the male population, using the number of new male cases over the total person-years males were at risk.
Incidence Rate Ratio ( $IRR$ ):
$IRR = \frac{\text{Incidence Rate in Group 1}}{\text{Incidence Rate in Group 2}} = \frac{I<em>1}{I</em>2}$
- Explanation: The IRR compares the incidence rates between two different groups (e.g., males and females). An IRR of $0.75$ means that the incidence rate in Group 1 is $75\%$ of the incidence rate in Group 2, or Group 1 has a $25\%$ lower risk compared to Group 2.