Ecological Fallacy and Aggregated Data: Case Study of Lyme Disease

This note explores the pitfalls of drawing conclusions from aggregated data, specifically highlighting the ecological fallacy and Simpson's paradox, where group-level relationships may not hold true for individuals or subgroups. Using Lyme disease as a case study, it demonstrates how seemingly related factors like biodiversity, restaurant density, and obesity can appear correlated at a state level but lack individual-level causal links. The discussion delves into the ecological aspects of Lyme disease, including the dilution effect, and emphasizes the caution needed when using aggregated public health data to inform theory and policy.

Key Concepts

  • Ecological fallacy: inferences about individuals drawn from group-level data can be incorrect.

  • Simpson’s paradox: relationships that appear in aggregated data can reverse or vanish in subgroups.

  • Aggregated data across large spatial scales can generate spurious correlations if mechanistic drivers are not measured.

  • Lyme disease case study illustrates how biodiversity, restaurant density, and obesity can appear related at the state level, but these relationships may be meaningless for individual-level risk.

  • Dilution effect (biodiversity) is a hypothesized mechanism for reducing disease risk, but evidence is not guaranteed and can be confounded by scale.

  • Spillover in vector-borne diseases involves pathogen transmission among wildlife, vectors, and humans; public health data usually come from notifiable cases and are aggregated, which can obscure the true dynamics.

Background on Lyme disease and ecology

  • Lyme disease is caused by Borrelia burgdorferi sensu stricto and is transmitted by Ixodes scapularis.

  • Humans are a dead-end host; spillover occurs when infected ticks bite people.

  • Biodiversity may influence disease risk via the dilution effect, with more competent vs. incompetent hosts affecting tick infection prevalence.

  • The case study uses biodiversity (mammal species richness), fried chicken restaurant density, and obesity to explore exposure and behavior patterns, not causal mechanisms.

  • The study emphasizes that correlations from aggregated data should be treated with caution when informing theory or policy.

Data and Methods

  • Data sources (35 US states where Lyme disease is commonly reported):- Lyme disease incidence (cases per 100,000 in