1/11
Vocabulary-focused flashcards covering the key theoretical frameworks and critiques of data scaling, institutional constraints, and the interpretive nature of data cleaning.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data scarcity
The structural constraint on AI progress caused by the exhaustion of high-quality human cultural production and the internet's finite nature.
Synthetic data
A proposed workaround for data scarcity that involves using artificial datasets, which introduces risks of model collapse, reinforced bias, and recursive degradation.
Model collapse
The recursive degradation of AI performance caused by training models on synthetic data rather than authentic human data.
Data exhaustion
The point at which high-quality text and data sources available for AI training are fully utilized or privatized by paywalls and copyright.
Capta
A term used by Schöch (2013) to redefine data not as something naturally given, but as something captured through scholarly and institutional acts of selection.
Big data
Data prioritized for scale, correlation, and pattern, which often sacrifices context and specific meaning.
Smart data
Curation-heavy data that prioritizes context, interpretation, and meaning at the expense of large-scale pattern detection.
Disciplinary epistemologies
The field-specific norms, methods, and incentives that prevent the existence of a universal "data culture" and shape how knowledge is produced.
Knowledge infrastructures
The institutional systems, funding models, and credit mechanisms that determine which kinds of data are maintained, documented, and allowed to circulate.
Tidy data
A standard of data structure defined by Wickham (2014) where the act of defining variables and observations acts as a philosophical decision that encodes theory into datasets.
Epistemic encoding
The process by which human choices in data cleaning and structuring determine what relationships are visible and what questions can be meaningfully asked.
Interpretive labor
The act of cleaning and tidying data, which involves making subjective decisions that embed assumptions about the world into analytical structures.