Data 101 - Midterm 2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/29

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

P-value

Probability of results this extreme if null hyphotesis is true.

2
New cards
Prosecutor fallacy
Confusing P(Evidence|Innocent) with P(Innocent|Evidence).
3
New cards
Defense fallacy
Claiming evidence is meaningless because false positives exist.
4
New cards
Bayesian posterior formula
P(H|E) = (P(E|H)P(H)) / P(E).
5
New cards
t-test
Unknown population variances, moderate sample size.
6
New cards
Permutation test
When no parametric assumptions can be made.
7
New cards
k-anonymity
Each record indistinguishable from at least k-1 others.
8
New cards
Limitation of k-anonymity
Vulnerable to linkage attacks.
9
New cards
Differential privacy
Adding noise so individuals cannot be identified.
10
New cards
ε in differential privacy
Privacy budget; lower ε = stronger privacy.
11
New cards
Disparate impact
Neutral rule disproportionately harms protected groups.
12
New cards
Disparate treatment
Explicit unequal treatment based on protected attribute.
13
New cards
Redundant encoding
A non-protected feature acts as a proxy (ZIP → race).
14
New cards
Individual fairness
Similar individuals should be treated similarly.
15
New cards
Conjunction fallacy
Judging a detailed scenario as more likely than the general one.
16
New cards
Survivorship bias
Only observing who survives, missing failures.
17
New cards
Ecological fallacy
Inferring individual traits from group averages.
18
New cards

Goodhart's Law

When a measure becomes a target, it stops being a good measure.

19
New cards
Binomial approximate normal
Large n and p not near 0 or 1.
20
New cards
68-95-99.7 rule
For normal: ±1σ (68%), ±2σ (95%), ±3σ (99.7%).
21
New cards
Bonferroni correction
α_new = α / (#tests) controls familywise error.
22
New cards
High accuracy unfairness
Because underlying data may encode bias.
23
New cards
Truncated y-axis problem
Exaggerates small differences.
24
New cards
Permutation distribution
Distribution of statistics across label-shuffled datasets under H₀.
25
New cards
Simpson's paradox
Trend reverses when data is aggregated vs subgrouped.
26
New cards
Standard deviation
Spread around the mean.
27
New cards
Low prevalence false-positive issues
False positives may outnumber true positives.
28
New cards
Practical significance
Whether effect size matters in the real world.
29
New cards
Central Limit Theorem (CLT)
Means of samples → normal as n increases.
30
New cards
Proportional ink
Graphical area should match magnitude.