Study Notes on Statistical Imaginaries by Danah Boyd
Statistical Imaginaries: An Ode to Responsible Data Science by danah boyd
Introduction
Presented at the 2021 Microsoft Research Summit.
Data discussed: total residential population in the US as of April 1, 2020, reported as 331,449,281 by the US Census Bureau.
Concept of "data theater" introduced, signaling how statistics is structured as a performance for public consumption.
Understanding Data and Statistics
Root of data: Latin origin, meaning "the givens."
Statistics as a science of the state; originally "political arithmetic."
Official statistics: Designation for state-produced statistics, treated as unquestionable facts.
Precision vs. Accuracy
Statistics are expected to reflect exact knowledge, leading to public expectations of precision.
Census as enumeration: A count of the population is assumed precise, although it's based on the best methodologies available.
Example of communication by the Census Bureau: population figures are presented rounded (e.g., nearly 331.5 million), which introduces uncertainty.
Importance of Censuses
Censuses are foundational for various data analyses, informing GDP, employment rates, and health data (e.g., COVID-19 tracking).
Census data aid in political representation and funding distribution, being deeply entrenched in democratic processes.
Political nature of data: Censuses are not just accounting tools; they are politically charged and often contested.
UN's statistical commission: Established in 1947 to formalize international standards to protect national statistics from political interference.
The Politics of Statistics
Statistics viewed through a lens of objectivity can obscure political implications.
Historical reference: statistical techniques and methods can have troubling origins. E.g., regression methods associated with eugenics.
Historical undercounting of Black populations tied to data practices analyzed by clerks at the US Census Bureau in the 1910s.
As statistics professionalized, government gained interest in using scientific advancements for better data.
Legal and Political Challenges in Data Collection
Introduction of sampling in the Census to ease data collection faced legal challenges (e.g., Constitutional arguments against sampling and imputation).
Imputation: A technique used to estimate missing data; legally challenged but defended by the Supreme Court.
The Role of Data in Decision-Making
Data decisions involve inherent politics; choices about what data to collect reflect biases and ideologies.
Example of racial data collection in the US as both a tool for oppression and civil rights.
Contrasting international approaches to race data collection: France's outlawing and Lebanon's historical census challenges.
Trust and Privacy in Census Data
Public trust in government is crucial for participation in the census; privacy concerns since 1840 have influenced census data collection practices.
Legal standards exist to maintain confidentiality; dating back to prevent misuse of census data.
Increasing distrust complicates efforts to collect accurate data, especially for sensitive demographics.
Challenges of Statistical Representation
Data Imperfections: Acknowledging operational, social, and political reasons for undercounts.
Misunderstanding of how statistics 'speak' leads to misrepresentation of data limitations.
Decision-makers often prioritize a narrative of uncertainty-free data, disregarding inherent errors and biases.
The Art and Ethics of Data Communication
Applied demographers highlight the challenge of conveying uncertainty in data to decision-makers who desire clear facts.
Case study: Climate scientists face difficulties in communicating model uncertainties effectively amidst skepticism.
Tension exists between providing accurate data representation and the political need for undisputed facts.
The Illusion of Perfect Data
Expectation for perfect, precise data disregards the realities of statistical practice.
Statistical imaginary: A collective vision of data that shapes expectations; e.g., envisioned census as a perfect reflection of society versus its actual limitations.
Discussions surrounding uncertainty and error often become taboo, risking the integrity of statistical methodologies.
Differential Privacy as a Forward Step
Introduction of differential privacy techniques aims to support statistical confidentiality while publishing usable statistics.
Differential Privacy: A mathematical framework that quantifies information leakage about individual records.
Develops flexible systems that allow for customization of privacy and data reliability.
Reactions to Differential Privacy
Backlash against modernized data practices by critics who want unaltered, fact-like data.
Opponents often do not perceive relationships between mathematical practices and data integrity.
The Stakes of Public Trust in Data
Public health relies heavily on accurate census data; misplaced expectations threaten financial and governmental decisions.
Increased politicization of statistics stands jeopardizing both the trust and utility of data in rich contexts like public health, policymaking, and economic planning.
Engaging with Uncertainty
Engaging deeply with statistical uncertainty is increasingly pressing.
Need for balance: Counteracting the trend of ignoring uncertainty to promote responsible data practices.
Build tools that enhance data trustworthiness while educating users on data limitations.
Conclusion: Grounding Statistical Imaginaries in Practice
Effective data governance requires acknowledging the complexities of working with data and its inherent imperfections.
Moving forward involves a collective effort to reconcile the idealized visions of data with practical, responsible methodologies that accurately reflect societal truths.
References
Bouk, Dan. (2015). How Our Days Became Numbered: Risk and the Rise of the Statistical Individual. University Of Chicago Press.
Daston, Lorraine & Galison, Peter. (1992). "The Image of Objectivity." Representations, 40, 81–128.
Dwork, Cynthia, Kohli, Nitin, & Mulligan, Deirdre. (2019). "Differential Privacy in Practice: Expose Your Epsilons!" Journal of Privacy and Confidentiality, 9(2).
Jasanoff, Sheila, & Kim, Sang-Hyun (Eds.). (2015). Dreamscapes of Modernity. The University of Chicago Press.
Porter, Theodore. (1995). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton University Press.
Starr, Paul (1987). "The Sociology of Official Statistics." In W. Alonso & P. Starr (Eds.), The Politics of Numbers (pp. 7–57). Russell Sage Foundation.