Study Notes on Statistical Imaginaries by Danah Boyd

Statistical Imaginaries: An Ode to Responsible Data Science by danah boyd

Introduction

  • Presented at the 2021 Microsoft Research Summit.

  • Data discussed: total residential population in the US as of April 1, 2020, reported as 331,449,281 by the US Census Bureau.

  • Concept of "data theater" introduced, signaling how statistics is structured as a performance for public consumption.

Understanding Data and Statistics

  • Root of data: Latin origin, meaning "the givens."

  • Statistics as a science of the state; originally "political arithmetic."

  • Official statistics: Designation for state-produced statistics, treated as unquestionable facts.

Precision vs. Accuracy
  • Statistics are expected to reflect exact knowledge, leading to public expectations of precision.

  • Census as enumeration: A count of the population is assumed precise, although it's based on the best methodologies available.

  • Example of communication by the Census Bureau: population figures are presented rounded (e.g., nearly 331.5 million), which introduces uncertainty.

Importance of Censuses

  • Censuses are foundational for various data analyses, informing GDP, employment rates, and health data (e.g., COVID-19 tracking).

  • Census data aid in political representation and funding distribution, being deeply entrenched in democratic processes.

  • Political nature of data: Censuses are not just accounting tools; they are politically charged and often contested.

  • UN's statistical commission: Established in 1947 to formalize international standards to protect national statistics from political interference.

The Politics of Statistics

  • Statistics viewed through a lens of objectivity can obscure political implications.

  • Historical reference: statistical techniques and methods can have troubling origins. E.g., regression methods associated with eugenics.

  • Historical undercounting of Black populations tied to data practices analyzed by clerks at the US Census Bureau in the 1910s.

  • As statistics professionalized, government gained interest in using scientific advancements for better data.

Legal and Political Challenges in Data Collection
  • Introduction of sampling in the Census to ease data collection faced legal challenges (e.g., Constitutional arguments against sampling and imputation).

  • Imputation: A technique used to estimate missing data; legally challenged but defended by the Supreme Court.

The Role of Data in Decision-Making

  • Data decisions involve inherent politics; choices about what data to collect reflect biases and ideologies.

  • Example of racial data collection in the US as both a tool for oppression and civil rights.

  • Contrasting international approaches to race data collection: France's outlawing and Lebanon's historical census challenges.

Trust and Privacy in Census Data
  • Public trust in government is crucial for participation in the census; privacy concerns since 1840 have influenced census data collection practices.

  • Legal standards exist to maintain confidentiality; dating back to prevent misuse of census data.

  • Increasing distrust complicates efforts to collect accurate data, especially for sensitive demographics.

Challenges of Statistical Representation

  • Data Imperfections: Acknowledging operational, social, and political reasons for undercounts.

  • Misunderstanding of how statistics 'speak' leads to misrepresentation of data limitations.

  • Decision-makers often prioritize a narrative of uncertainty-free data, disregarding inherent errors and biases.

The Art and Ethics of Data Communication

  • Applied demographers highlight the challenge of conveying uncertainty in data to decision-makers who desire clear facts.

  • Case study: Climate scientists face difficulties in communicating model uncertainties effectively amidst skepticism.

  • Tension exists between providing accurate data representation and the political need for undisputed facts.

The Illusion of Perfect Data

  • Expectation for perfect, precise data disregards the realities of statistical practice.

  • Statistical imaginary: A collective vision of data that shapes expectations; e.g., envisioned census as a perfect reflection of society versus its actual limitations.

  • Discussions surrounding uncertainty and error often become taboo, risking the integrity of statistical methodologies.

Differential Privacy as a Forward Step

  • Introduction of differential privacy techniques aims to support statistical confidentiality while publishing usable statistics.

  • Differential Privacy: A mathematical framework that quantifies information leakage about individual records.

  • Develops flexible systems that allow for customization of privacy and data reliability.

Reactions to Differential Privacy
  • Backlash against modernized data practices by critics who want unaltered, fact-like data.

  • Opponents often do not perceive relationships between mathematical practices and data integrity.

The Stakes of Public Trust in Data

  • Public health relies heavily on accurate census data; misplaced expectations threaten financial and governmental decisions.

  • Increased politicization of statistics stands jeopardizing both the trust and utility of data in rich contexts like public health, policymaking, and economic planning.

Engaging with Uncertainty

  • Engaging deeply with statistical uncertainty is increasingly pressing.

  • Need for balance: Counteracting the trend of ignoring uncertainty to promote responsible data practices.

  • Build tools that enhance data trustworthiness while educating users on data limitations.

Conclusion: Grounding Statistical Imaginaries in Practice

  • Effective data governance requires acknowledging the complexities of working with data and its inherent imperfections.

  • Moving forward involves a collective effort to reconcile the idealized visions of data with practical, responsible methodologies that accurately reflect societal truths.
       

References

  • Bouk, Dan. (2015). How Our Days Became Numbered: Risk and the Rise of the Statistical Individual. University Of Chicago Press.

  • Daston, Lorraine & Galison, Peter. (1992). "The Image of Objectivity." Representations, 40, 81–128.

  • Dwork, Cynthia, Kohli, Nitin, & Mulligan, Deirdre. (2019). "Differential Privacy in Practice: Expose Your Epsilons!" Journal of Privacy and Confidentiality, 9(2).

  • Jasanoff, Sheila, & Kim, Sang-Hyun (Eds.). (2015). Dreamscapes of Modernity. The University of Chicago Press.

  • Porter, Theodore. (1995). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton University Press.

  • Starr, Paul (1987). "The Sociology of Official Statistics." In W. Alonso & P. Starr (Eds.), The Politics of Numbers (pp. 7–57). Russell Sage Foundation.