9/11: Probability and Contingency Tables — Quick Reference
Probability and Contingency Tables — Quick Reference
Probability basics
- Probability is a number between 0 and 1 (0% to 100%), representing uncertainty about an event.
- A marginal probability refers to a single event (appears in the margins of a table or as P(A)).
- A joint probability refers to two events occurring together (A and B).
- A conditional probability is the probability of an event given that another event has occurred: P(B|A) = \dfrac{P(A,B)}{P(A)}.
- The conditional probability is not symmetric: P(B|A) \neq P(A|B) in general.
Notation (overview)
- Joint: P(A,B)\;\text{or}\;P(A\cap B)
- Conditional: P(B|A)
- Marginal: P(A) (probability of a single event)
Sources of probability
- Careful counting (requires equal likelihood of outcomes): sample space and counts, e.g.,
- Fair deck of 52 cards: P(\text{diamond}) = \dfrac{13}{52}, P(\text{K or Q}) = \dfrac{8}{52}.
- Data-driven probabilities: actuarial data, census data, etc. (long-run frequencies from historical data).
- Probabilities from data summaries: use contingency tables to summarize categorical data and derive probabilities.
Contingency tables and data-derived probabilities
- A contingency table lists counts (or proportions) for combinations of two (or more) categorical variables.
- Rows/columns represent categories; cells hold counts of observations in each category pair.
- Marginal sums give marginal probabilities (across a row or a column).
- Example structure (ACLFest data): LaLa (La La La) vs not; ACL vs not; each band is a row.
- In the ACLFest example: total bands = 1,238. Cells include:
- LaLa and ACL: 77
- LaLa and not ACL: 361
- not LaLa and ACL: 81
- not LaLa and not ACL: (remaining to sum to 1238)
- From counts to probabilities:
- Joint probability for a cell: count / total observations
- Marginal probability for a single variable: row or column total / total observations
Conditional probabilities from contingency tables
- Example: Probability a band played ACL given they played LaLa: P(ACL|LaLa) = \dfrac{P(LaLa, ACL)}{P(LaLa)} = \dfrac{77}{77+361} \approx 0.18. (18%)
- Joint probability: P(ACL, LaLa) = \dfrac{77}{1238} \approx 0.062. (6%)
- Approach: compute counts first, then convert to probabilities; use software to automate.
- Core idea: the probability of two events both occurring equals:P(A,B) = P(A) \cdot P(B|A) = P(B) \cdot P(A|B).
- If you know a marginal probability and a conditional probability, you can recover the joint:P(A|B) = \dfrac{P(A,B)}{P(B)}.n
Using R for contingency tables (XTABs)
- xtabs: cross-tabulate to produce counts in a contingency table.
- Example: xtabs(~ LaLa + ACL, data = ACLFest)
- Proportions from xtabs:
- prop.table(xtabs(…)) gives joint proportions (sums to 1).
- prop.table(xtabs(…), margin = 1) or margin = 2 gives conditional proportions along rows or columns respectively.
- Practical workflow:
- Compute counts with xtabs.
- Convert to proportions with prop.table.
- Use margin to condition on a specific variable (margin = 1 for row-conditioning, margin = 2 for column-conditioning).
- Use round(x, digits) to format results for reporting.
- The results from xtabs are counts; proportions come from prop.table; you can multiply by 100 to get percentages.
- Pipe syntax (e.g., with the tidyverse) can streamline: xtabs(…) %>% prop.table(margin = …) %>% round(digits).
Practical example: ACLFest and LaLaLa
- Data frame: ACLFest with binary indicators for festivals (1 = played, 0 = not).
- Variables of interest: LaLaPalooza (LaLa) and ACL fest (ACL).
- Joint distribution: counts in each cell (e.g., 77 bands played both).
- Conditional example: probability ACL given LaLa: P(ACL|LaLa) = \dfrac{77}{77+361} \approx 0.18.
- Total observations: 1,238 bands; support by cross-tabulated counts to report conditional, marginal, and joint probabilities.
Subjective probability (brief)
- Not all probabilities come from long-run frequencies; subjective probabilities reflect personal judgments or markets.
- Prediction markets and odds can express subjective probabilities via willingness to pay for a payoff (e.g., betting on outcomes).
Quick practice question (conceptual)
- Given a contingency table, identify:
- The marginal probability of a given festival without conditioning on the other festival.
- The conditional probability P(B|A) for a specified A and B.
- The joint probability P(A,B) from the corresponding cell count.
Tips for last-minute study
- Remember the key distinctions: marginal vs joint vs conditional.
- Use xtabs to quickly form tables, then prop.table to get proportions.
- Use margin = 1 (rows) or margin = 2 (columns) to obtain conditional probabilities.
- Round results appropriately for reporting; consider reporting percentages by multiplying by 100.
- Keep a reference handout handy for common xtabs and prop.table patterns.