Data 101 - Midterm 2

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/29

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

30 Terms

New cards

P-value

Probability of results this extreme if null hyphotesis is true.

New cards

Prosecutor fallacy

Confusing P(Evidence|Innocent) with P(Innocent|Evidence).

New cards

Defense fallacy

Claiming evidence is meaningless because false positives exist.

New cards

Bayesian posterior formula

P(H|E) = (P(E|H)P(H)) / P(E).

New cards

t-test

Unknown population variances, moderate sample size.

New cards

Permutation test

When no parametric assumptions can be made.

New cards

k-anonymity

Each record indistinguishable from at least k-1 others.

New cards

Limitation of k-anonymity

Vulnerable to linkage attacks.

New cards

Differential privacy

Adding noise so individuals cannot be identified.

New cards

ε in differential privacy

Privacy budget; lower ε = stronger privacy.

New cards

Disparate impact

Neutral rule disproportionately harms protected groups.

New cards

Disparate treatment

Explicit unequal treatment based on protected attribute.

New cards

Redundant encoding

A non-protected feature acts as a proxy (ZIP → race).

New cards

Individual fairness

Similar individuals should be treated similarly.

New cards

Conjunction fallacy

Judging a detailed scenario as more likely than the general one.

New cards

Survivorship bias

Only observing who survives, missing failures.

New cards

Ecological fallacy

Inferring individual traits from group averages.

New cards

Goodhart's Law

When a measure becomes a target, it stops being a good measure.

New cards

Binomial approximate normal

Large n and p not near 0 or 1.

New cards

68-95-99.7 rule

For normal: ±1σ (68%), ±2σ (95%), ±3σ (99.7%).

New cards

Bonferroni correction

α_new = α / (#tests) controls familywise error.

New cards

High accuracy unfairness

Because underlying data may encode bias.

New cards

Truncated y-axis problem

Exaggerates small differences.

New cards

Permutation distribution

Distribution of statistics across label-shuffled datasets under H₀.

New cards

Simpson's paradox

Trend reverses when data is aggregated vs subgrouped.

New cards

Standard deviation

Spread around the mean.

New cards

Low prevalence false-positive issues

False positives may outnumber true positives.

New cards

Practical significance

Whether effect size matters in the real world.

New cards

Central Limit Theorem (CLT)

Means of samples → normal as n increases.

New cards

Proportional ink

Graphical area should match magnitude.