Notes on Probabilistic Outcomes, Conditional Probability, Bayes' Theorem, and Contingency Tables
Probability Basics and Conditional Probability
Probabilistic outcomes organize how likely different results are when you have uncertainty. Key ideas include general probability, conditional probability, and how to combine outcomes across sequences of events.
Notation you’ll see:
$P(A)$: probability of event A.
$P(A|B)$: probability of A given that B has occurred (conditional probability).
$P(A\cap B)$: probability that both A and B occur (joint probability).
For a sequence of independent events, joint probability can be written as a product of individual probabilities.
Basic coin-toss example (two outcomes per toss):
A single fair coin has two outcomes: heads (H) or tails (T). So $P(H)=\frac{1}{2}$ and $P(T)=\frac{1}{2}$.
If you toss the coin twice, there are four equally likely sequences: HH, HT, TH, TT, each with probability $\frac{1}{4}$.
First toss outcome and second-toss conditional probabilities:
If the first toss is heads, the second toss can be H or T: $P(H2|H1)=\frac{1}{2}$ and $P(T2|H1)=\frac{1}{2}$.
If the first toss is tails, the second toss can be H or T: $P(H2|T1)=\frac{1}{2}$ and $P(T2|T1)=\frac{1}{2}$.
These conditional probabilities sum to 1 for a given first outcome: e.g., $P(H2|H1)+P(T2|H1)=1$ and similarly for $T_1$.
Independence and conditional vs joint probability:
If events A and B are independent, then $P(A\cap B)=P(A)P(B)$.
In general (not assuming independence), $P(A\cap B)=P(A)P(B|A)$.
The joint probability for a sequence can be written as a product of conditional probabilities, e.g. for two flips:
P(H1\cap H2)=P(H1)\,P(H2|H_1).For independent events, $P(H1)P(H2|H1)=P(H1)P(H_2)$, so the choice of which rule to use doesn’t change the result.
Joint probabilities for a sequence (examples):
Sequence HH: $P(H1\cap H2)=P(H1)P(H2|H_1)=\tfrac{1}{2}\cdot\tfrac{1}{2}=\tfrac{1}{4}$.
Sequence HT: $P(H1\cap T2)=P(H1)P(T2|H_1)=\tfrac{1}{2}\cdot\tfrac{1}{2}=\tfrac{1}{4}$.
Sequence TH: $P(T1\cap H2)=P(T1)P(H2|T_1)=\tfrac{1}{2}\cdot\tfrac{1}{2}=\tfrac{1}{4}$.
Sequence TT: $P(T1\cap T2)=P(T1)P(T2|T_1)=\tfrac{1}{2}\cdot\tfrac{1}{2}=\tfrac{1}{4}$.
The sum of all joint probabilities equals 1: $\tfrac{1}{4}+\tfrac{1}{4}+\tfrac{1}{4}+\tfrac{1}{4}=1$.
Why conditional probabilities matter:
Conditional probabilities describe how likely outcomes are when you already know something about the past. They are often necessary in decision making when outcomes depend on prior events.
A middle branch (the conditional probability) in a tree diagram represents the probability of a second event given the first event.
The tree diagram helps organize thoughts and compute probabilities across sequences, including when there are different numbers of outcomes at each step (not just heads/tails, but also dice, cards, etc.).
A practical framework: Bayes’ theorem, prior and likelihood, updating beliefs
Bayes’ theorem provides a principled way to update the probability of a hypothesis after observing new evidence.
The general idea: start with a prior probability for a hypothesis, multiply by the likelihood of the new evidence under that hypothesis, and normalize by the total probability of the evidence under all hypotheses.
The airport-baggage example (illustrative numbers):
Let F be the event “bag contains a forbidden item.” Prior: $P(F)=0.05$ (5%). Then $P(
eg F)=0.95$.Alarm given a forbidden item: $P(Alarm|F)=0.98$ (true positive).
Alarm given no forbidden item: $P(Alarm|
eg F)=0.08$ (false positive).We want the posterior $P(F|Alarm)$:
P(F|Alarm)=\frac{P(Alarm|F)P(F)}{P(Alarm|F)P(F)+P(Alarm|\neg F)P(\neg F)}.
Substituting numbers: P(F|Alarm)=\frac{0.98\cdot 0.05}{0.98\cdot 0.05+0.08\cdot 0.95}\approx\frac{0.049}{0.049+0.076}=0.392.Interpretation: After the alarm sounds, about 39.2% of such alarms correspond to bags with a forbidden item (posterior belief), rather than the prior 5%.
Bayes’ theorem key form (posterior):
P(F|Alarm)=\frac{P(Alarm|F)\,P(F)}{P(Alarm|F)\,P(F)+P(Alarm|\neg F)\,P(\neg F)}.The idea of updating beliefs when new evidence arrives is called posterior updating; Bayes’ rule provides the exact mechanism.
Important notes:
The prior $P(F)$ encodes what you believed before seeing the alarm.
The likelihoods $P(Alarm|F)$ and $P(Alarm|\neg F)$ encode how informative the alarm is about the presence of a forbidden item.
As the evidence becomes more reliable (likelihoods move toward 0 or 1), the posterior moves more away from the prior.
Contingency tables, joint/marginal/conditional probabilities
A contingency table is a two-way table that tabulates counts for two nominal (categorical) variables, for example, favorite winter sport and college type.
Key concepts:
Joint probability: the probability that both category A and category B occur. In a table with total N, a cell count c corresponds to joint probability $P(A=a, B=b)=\frac{c}{N}$.
Marginal (overall) probabilities: probabilities of a single variable when summing across the other variable (row totals or column totals divided by N).
Conditional probability: the probability of one variable given a fixed value of the other, e.g. P(A=a|B=b)=\frac{P(A=a, B=b)}{P(B=b)}.
How to read a contingent table:
For a fixed row (e.g., four-year college), conditional probabilities are taken by dividing the counts in that row by the row total (or by the relevant column total if conditioning on the other variable).
For a fixed column (e.g., skiing), conditional probabilities are taken by dividing by the column total.
Example outlines (data summarized, not copied exactly from the transcript):
Suppose a survey of 545 students asks for favorite winter sport and college type. The joint counts fill a table; from there you can compute:
$P( ext{Skiing})=\frac{\text{# skiing}}{545}$ (a joint probability across college types if you sum across college types).
$P(\text{FourYear} | \text{IceSkating})=\frac{\text{# FourYear and IceSkating}}{\text{# IceSkating}}$ (a conditional probability).
$P(\text{IceSkating} | \text{FourYear})=\frac{\text{# FourYear and IceSkating}}{\text{# FourYear}}$ (the other direction).
Converting between formats:
The joint probabilities are the counts divided by the grand total (N).
Conditional probabilities are the joint divided by the relevant marginal (row or column total).
You can form a tree diagram from contingency table data by treating each level as a branch and deriving conditional probabilities from the appropriate marginals.
Marginal vs conditional probabilities
Marginal probability refers to the probability of a single variable, ignoring the other (a sum across a row or column when normalizing by N).
Conditional probability isolates a row or a column and normalizes by that row/column total.
Worked outline: turning a contingency table into a tree diagram (practice outline)
Start with the grand total N.
Create branches for the first variable (e.g., University A vs University B) with probabilities equal to their marginal proportions.
For each branch, create sub-branches for the second variable’s categories with conditional probabilities given the first branch.
Compute joint probabilities as products of the marginal and conditional probabilities along each path.
Ensure that each set of branches from a node sums to 1 (probabilities along that node's outgoing branches).
This framework lets you answer questions about both joint and conditional probabilities, and it provides intuition for the equivalence between a tree and a contingency table.
Practical example: a headline-like comprehension problem using a two-university table (structure shown, not exact numbers)
Setup: counts of graduates by university (A, B) and by income bracket (<20k, 20–39k, 40k+).
Steps:
Compute marginal proportions for each university (e.g., $P(A)$, $P(B)$).
Compute conditional proportions in each university for each income bracket (e.g., $P(\text{<20k}|A)$, etc.).
Compute joint probabilities by multiplying marginals by the corresponding conditionals.
Check that the sum of all joint probabilities equals 1.
Note: you may also present the same information by swapping the conditioning order (e.g., condition on income bracket first). The joint probabilities stay the same; the conditional probabilities will look different depending on the conditioning variable, but the math is consistent.
Practice problems and key strategies highlighted in the session
Inclusion–exclusion for unions of events:
For two events A and B: P(A\cup B)=P(A)+P(B)-P(A\cap B).
Example: If $P(A)=0.8$, $P(B)=0.6$, and $P(A\cap B)=0.5$, then P(A\cup B)=0.8+0.6-0.5=0.9. The complement probability is $1-0.9=0.1$.
Sequential probability without replacement (dependent events):
If you have a batch with a certain defect rate, the probability of successive defects depends on prior outcomes.
Example with defectives without replacement (typical numbers): if $P(D1)=0.40$ and there are 39 defectives left out of 99 items after the first defect, then $P(D2|D1)=\frac{39}{99}$ and P(D1\cap D2)=0.40\cdot\frac{39}{99}\approx 0.1576. If instead sampling with replacement, $P(D2|D1)=P(D2)=0.40$.
Probability of getting all correct on a multiple-choice test: if each question has 4 options, and you guess, then for 10 questions
P( ext{all correct})=\left(\frac{1}{4}\right)^{10}=\frac{1}{4^{10}}=\frac{1}{1{,}048{,}576}.General guidance on numerical representations in exams:
Use fractions when numbers are given as fractions; use decimals if provided as decimals.
Do not convert decimals to fractions unless instructed; decimals are fine to work with.
Computation tips and exam strategy:
Keep expressions in exact fractions when possible; simplify where reasonable.
Expect a mix of problems: simple probability, conditional/probability trees, Bayes’ updates, and contingency-table interpretations.
Practice with a variety of example formats to build fluency with joint/conditional/marginal probabilities and transitions between representations.
Quick recap of essential formulas to memorize
Conditional probability: P(A|B)=\frac{P(A\cap B)}{P(B)}.
Joint probability for independent events: P(A\cap B)=P(A)P(B). For dependent events: P(A\cap B)=P(A)P(B|A).
Bayes’ theorem (posterior): P(F|Alarm)=\frac{P(Alarm|F)\,P(F)}{P(Alarm|F)\,P(F)+P(Alarm|\neg F)\,P(\neg F)}.
Inclusion–exclusion for two events: P(A\cup B)=P(A)+P(B)-P(A\cap B).
Tree-diagram construction: joint probability along a path equals the product of conditional probabilities along that path.
Exam preparation takeaway
Practice constructing and reading both trees and contingency tables.
Be comfortable switching between joint, marginal, and conditional representations.
Use Bayes’ theorem to update beliefs when new evidence arrives (posterior probabilities).
Expect straightforward application problems (coin tosses, alarm examples), probability with multiple events, and simple contingency-table interpretations.