Unit 2: What Data Becomes Under Scale, Institutions, and Power

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/11

flashcard set

Earn XP

Description and Tags

Vocabulary-focused flashcards covering the key theoretical frameworks and critiques of data scaling, institutional constraints, and the interpretive nature of data cleaning.

Last updated 7:19 PM on 5/2/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

12 Terms

1
New cards

Data scarcity

The structural constraint on AI progress caused by the exhaustion of high-quality human cultural production and the internet's finite nature.

2
New cards

Synthetic data

A proposed workaround for data scarcity that involves using artificial datasets, which introduces risks of model collapse, reinforced bias, and recursive degradation.

3
New cards

Model collapse

The recursive degradation of AI performance caused by training models on synthetic data rather than authentic human data.

4
New cards

Data exhaustion

The point at which high-quality text and data sources available for AI training are fully utilized or privatized by paywalls and copyright.

5
New cards

Capta

A term used by Schöch (20132013) to redefine data not as something naturally given, but as something captured through scholarly and institutional acts of selection.

6
New cards

Big data

Data prioritized for scale, correlation, and pattern, which often sacrifices context and specific meaning.

7
New cards

Smart data

Curation-heavy data that prioritizes context, interpretation, and meaning at the expense of large-scale pattern detection.

8
New cards

Disciplinary epistemologies

The field-specific norms, methods, and incentives that prevent the existence of a universal "data culture" and shape how knowledge is produced.

9
New cards

Knowledge infrastructures

The institutional systems, funding models, and credit mechanisms that determine which kinds of data are maintained, documented, and allowed to circulate.

10
New cards

Tidy data

A standard of data structure defined by Wickham (20142014) where the act of defining variables and observations acts as a philosophical decision that encodes theory into datasets.

11
New cards

Epistemic encoding

The process by which human choices in data cleaning and structuring determine what relationships are visible and what questions can be meaningfully asked.

12
New cards

Interpretive labor

The act of cleaning and tidying data, which involves making subjective decisions that embed assumptions about the world into analytical structures.