Sensitivity, Specificity, and the 2x2 Contingency Table
Key definitions
Sensitivity (true positive rate): the probability that the test is positive given the person is infected. Se = P( ext{Test} = + \,|\, ext{Infected})
Specificity (true negative rate): the probability that the test is negative given the person is not infected. Sp = P( ext{Test} = - \,|\, ext{Not Infected})
Interpretation: among those who test positive, about 6.01% are actually infected.
Proportion infected among positives: same as PPV, ≈ 0.0601 (6.01%).
Proportion not infected among positives (false positives proportion): \frac{FP}{N_+} = \frac{398.64}{424.14} \approx 0.9399 (≈ 93.99%).
Probability of infection among those who test negative: P(I|-) = \frac{FN}{N_-} = \frac{8.5}{9575.86} \approx 0.00089 (≈ 0.089%).
Probability that a negative result is actually infected is very low; conversely, a positive result is often a false positive when prevalence is very low.
Note on interpretation in exam context
Even with decent sensitivity and specificity, very low prevalence yields a low PPV; the test is not very useful for ruling in disease in a population with low base rate.
Worked example 2: South Africa vs USA comparison (higher prevalence)
South Africa (Eswatini-like scenario) base rate: p = 0.275\, (= 27.5\%)
Population: 10,000
Infected: NI = 0.275 \times 10000 = 2750; Not infected: N{
eg I} = 7250
Same test characteristics: Se = 0.75, \; Sp = 0.96
NoDeg unemployed proportion among unemployed: P(NoDeg|Unemployed) = \frac{215.74}{338.86} \approx 0.6367
College unemployed proportion among unemployed: P(College|Unemployed) = \frac{123.12}{338.86} \approx 0.3633
Overall unemployment rate in the year: P(Unemployed) = \frac{338.86}{10000} \approx 0.0339
Additional checks
Unemployed given NoDeg: P(Unemployed|NoDeg) = 0.0469 (given in data)
Unemployed given College: P(Unemployed|College) = 0.0228 (given in data)
Practical interpretation tips for exam problems
Do not round intermediate results before finishing the table
Always start with the base rate (prevalence) before applying sensitivity and specificity
When asked about predictive values, express results as probabilities (or percentages) and interpret in context
Be careful about what the numerator and denominator represent when forming probabilities from a contingency table (A ∩ B vs A given B, etc.)
Optional exercise references mentioned in the material
Houston flights example (page 55 of notes) as a similar probability exercise
A follow-up exercise: a second, non-medical two-category probability example using a 10,000-person hypothetical
Brief note on the broader concepts discussed
Statistical model vs statistic: a model is a mathematical description of data generation; a statistic is a summary value computed from a sample to estimate a model parameter
Independence model: two categorical variables A and B are independent if the value of A does not affect the distribution of B; e.g., weather vs day of week independence can be explored with simulations
The goal of these exercises is to develop intuition for how probabilities propagate through a model and how base rates influence decision-making in diagnostics and policy