9/11: Probability and Contingency Tables — Quick Reference

Probability and Contingency Tables — Quick Reference

Probability basics

  • Probability is a number between 0 and 1 (0% to 100%), representing uncertainty about an event.
  • A marginal probability refers to a single event (appears in the margins of a table or as P(A)).
  • A joint probability refers to two events occurring together (A and B).
  • A conditional probability is the probability of an event given that another event has occurred: P(B|A) = \dfrac{P(A,B)}{P(A)}.
  • The conditional probability is not symmetric: P(B|A) \neq P(A|B) in general.

Notation (overview)

  • Joint: P(A,B)\;\text{or}\;P(A\cap B)
  • Conditional: P(B|A)
  • Marginal: P(A) (probability of a single event)

Sources of probability

  • Careful counting (requires equal likelihood of outcomes): sample space and counts, e.g.,
    • Fair deck of 52 cards: P(\text{diamond}) = \dfrac{13}{52}, P(\text{K or Q}) = \dfrac{8}{52}.
  • Data-driven probabilities: actuarial data, census data, etc. (long-run frequencies from historical data).
  • Probabilities from data summaries: use contingency tables to summarize categorical data and derive probabilities.

Contingency tables and data-derived probabilities

  • A contingency table lists counts (or proportions) for combinations of two (or more) categorical variables.
  • Rows/columns represent categories; cells hold counts of observations in each category pair.
  • Marginal sums give marginal probabilities (across a row or a column).
  • Example structure (ACLFest data): LaLa (La La La) vs not; ACL vs not; each band is a row.
  • In the ACLFest example: total bands = 1,238. Cells include:
    • LaLa and ACL: 77
    • LaLa and not ACL: 361
    • not LaLa and ACL: 81
    • not LaLa and not ACL: (remaining to sum to 1238)
  • From counts to probabilities:
    • Joint probability for a cell: count / total observations
    • Marginal probability for a single variable: row or column total / total observations

Conditional probabilities from contingency tables

  • Example: Probability a band played ACL given they played LaLa: P(ACL|LaLa) = \dfrac{P(LaLa, ACL)}{P(LaLa)} = \dfrac{77}{77+361} \approx 0.18. (18%)
  • Joint probability: P(ACL, LaLa) = \dfrac{77}{1238} \approx 0.062. (6%)
  • Approach: compute counts first, then convert to probabilities; use software to automate.

The multiplication rule (intuition and formulas)

  • Core idea: the probability of two events both occurring equals:P(A,B) = P(A) \cdot P(B|A) = P(B) \cdot P(A|B).
  • If you know a marginal probability and a conditional probability, you can recover the joint:P(A|B) = \dfrac{P(A,B)}{P(B)}.n

Using R for contingency tables (XTABs)

  • xtabs: cross-tabulate to produce counts in a contingency table.
    • Example: xtabs(~ LaLa + ACL, data = ACLFest)
  • Proportions from xtabs:
    • prop.table(xtabs(…)) gives joint proportions (sums to 1).
    • prop.table(xtabs(…), margin = 1) or margin = 2 gives conditional proportions along rows or columns respectively.
  • Practical workflow:
    • Compute counts with xtabs.
    • Convert to proportions with prop.table.
    • Use margin to condition on a specific variable (margin = 1 for row-conditioning, margin = 2 for column-conditioning).
    • Use round(x, digits) to format results for reporting.
  • The results from xtabs are counts; proportions come from prop.table; you can multiply by 100 to get percentages.
  • Pipe syntax (e.g., with the tidyverse) can streamline: xtabs(…) %>% prop.table(margin = …) %>% round(digits).

Practical example: ACLFest and LaLaLa

  • Data frame: ACLFest with binary indicators for festivals (1 = played, 0 = not).
  • Variables of interest: LaLaPalooza (LaLa) and ACL fest (ACL).
  • Joint distribution: counts in each cell (e.g., 77 bands played both).
  • Conditional example: probability ACL given LaLa: P(ACL|LaLa) = \dfrac{77}{77+361} \approx 0.18.
  • Total observations: 1,238 bands; support by cross-tabulated counts to report conditional, marginal, and joint probabilities.

Subjective probability (brief)

  • Not all probabilities come from long-run frequencies; subjective probabilities reflect personal judgments or markets.
  • Prediction markets and odds can express subjective probabilities via willingness to pay for a payoff (e.g., betting on outcomes).

Quick practice question (conceptual)

  • Given a contingency table, identify:
    • The marginal probability of a given festival without conditioning on the other festival.
    • The conditional probability P(B|A) for a specified A and B.
    • The joint probability P(A,B) from the corresponding cell count.

Tips for last-minute study

  • Remember the key distinctions: marginal vs joint vs conditional.
  • Use xtabs to quickly form tables, then prop.table to get proportions.
  • Use margin = 1 (rows) or margin = 2 (columns) to obtain conditional probabilities.
  • Round results appropriately for reporting; consider reporting percentages by multiplying by 100.
  • Keep a reference handout handy for common xtabs and prop.table patterns.