Positive Predictive Value (PPV): PPV=N+TP=424.1425.5≈0.0601
Interpretation: among those who test positive, about 6.01% are actually infected.
Proportion infected among positives: same as PPV, ≈ 0.0601 (6.01%).
Proportion not infected among positives (false positives proportion): N+FP=424.14398.64≈0.9399 (≈ 93.99%).
Probability of infection among those who test negative: P(I∣−)=N−FN=9575.868.5≈0.00089 (≈ 0.089%).
Probability that a negative result is actually infected is very low; conversely, a positive result is often a false positive when prevalence is very low.
Note on interpretation in exam context
Even with decent sensitivity and specificity, very low prevalence yields a low PPV; the test is not very useful for ruling in disease in a population with low base rate.
Worked example 2: South Africa vs USA comparison (higher prevalence)
South Africa (Eswatini-like scenario) base rate: p=0.275(=27.5%)
Population: 10,000
Infected: N<em>I=0.275×10000=2750; Not infected: N</em><br/>¬I=7250
Same test characteristics: Se=0.75,Sp=0.96
Cell counts
TP: TP=0.75×2750=2062.5
FN: FN=0.25×2750=687.5
TN: TN=0.96×7250=6960
FP: FP=0.04×7250=290
Margins
Total positives: N+=TP+FP=2062.5+290=2352.5
Total negatives: N−=FN+TN=687.5+6960=7647.5
Conditional probabilities for positives
PPV: PPV=N+TP=2352.52062.5≈0.8776
About 87.8% of positive tests are true positives in this higher-prevalence setting.
Proportion of positives that are not infected: N+FP=2352.5290≈0.1234 (≈ 12.34%).
Conditional probabilities for negatives
Probability a negative is infected: P(I∣−)=N−FN=7647.5687.5≈0.0899%
Very small, but nonzero depending on prevalence.
Takeaway from SA vs USA
Higher prevalence improves PPV substantially; the same test yields far more reliable positives in high-prevalence populations.
Summary formulas to remember (for any base rate p, Se, Sp)
True positives: TP=Se×(p×10000)
False negatives: FN=(1−Se)×(p×10000)
True negatives: TN=Sp×((1−p)×10000)
False positives: FP=(1−Sp)×((1−p)×10000)
Positive test count: N+=TP+FP
Negative test count: N−=FN+TN
Positive Predictive Value: PPV=N+TP
Probability that a positive is infected: same as PPV
Probability that a negative is infected: P(I∣−)=N−FN
Probability that a negative is not infected (NPV): NPV=N−TN
Connecting to two-categorical-variable problems (the unemployment/degree example)
Given a population of 10,000 with two categories in one variable (degree) and a second variable (unemployment)
Base rates (marginals)
No college degree: P(NoDeg)=0.46 → counts: 4600
College degree: P(College)=0.54 → counts: 5400
Conditional probabilities for unemployment within each degree group
Among NoDeg, unemployed proportion: P(Unemployed∣NoDeg)=0.0469 → unemployed count: 0.0469imes4600=215.74
Among College, unemployed proportion: P(Unemployed∣College)=0.0228 → unemployed count: 0.0228imes5400=123.12
NoDeg unemployed proportion among unemployed: P(NoDeg∣Unemployed)=338.86215.74≈0.6367
College unemployed proportion among unemployed: P(College∣Unemployed)=338.86123.12≈0.3633
Overall unemployment rate in the year: P(Unemployed)=10000338.86≈0.0339
Additional checks
Unemployed given NoDeg: P(Unemployed∣NoDeg)=0.0469 (given in data)
Unemployed given College: P(Unemployed∣College)=0.0228 (given in data)
Practical interpretation tips for exam problems
Do not round intermediate results before finishing the table
Always start with the base rate (prevalence) before applying sensitivity and specificity
When asked about predictive values, express results as probabilities (or percentages) and interpret in context
Be careful about what the numerator and denominator represent when forming probabilities from a contingency table (A ∩ B vs A given B, etc.)
Optional exercise references mentioned in the material
Houston flights example (page 55 of notes) as a similar probability exercise
A follow-up exercise: a second, non-medical two-category probability example using a 10,000-person hypothetical
Brief note on the broader concepts discussed
Statistical model vs statistic: a model is a mathematical description of data generation; a statistic is a summary value computed from a sample to estimate a model parameter
Independence model: two categorical variables A and B are independent if the value of A does not affect the distribution of B; e.g., weather vs day of week independence can be explored with simulations
The goal of these exercises is to develop intuition for how probabilities propagate through a model and how base rates influence decision-making in diagnostics and policy