Notes on Independence in Contingency Tables and Simulation-based Inference

The Independence Model for Day of Week and Precipitation

Problem setup: two variables—day of the week (Weekday vs Weekend) and precipitation (Yes/No). If the variables are independent, the probability of precipitation is the same across the day categories.
Contingency table (theoretical under independence):
- Total days: N=700; Weekdays: N{ ext{wkday}}=500; Weekends: N{ ext{wknd}}=200.
- Overall probability of precipitation: p = 0.55 (as given by the example, consistent with the independence assumption).
- Expected counts under independence (2x2 table):
- E{ ext{wkday}, ext{Yes}} = rac{N{ ext{wkday}} imes (N_{ ext{Yes}})}{N} = rac{500 imes 384}{700} = 275
- E{ ext{wkday}, ext{No}} = rac{500 imes (N{ ext{No}})}{N} = rac{500 imes 316}{700} = 225
- E_{ ext{wknd}, ext{Yes}} = rac{200 imes 384}{700}
  = 109.714…
  ext{(approximately }110)
- E_{ ext{wknd}, ext{No}} = rac{200 imes 316}{700}
  = 90.285…
  ext{(approximately }90)
Observed data from Lansing Airport (7:00 days sample):
- Weekdays with precipitation: 265; Weekdays without: 235
- Weekends with precipitation: 119; Weekends without: 81
- Totals: Weekdays 500, Weekends 200, Precipitation Yes 265+119=384, Precipitation No 316
Compare observed vs. theoretical:
- Weekday Yes difference: 265 - 275 = -10
- Weekend Yes difference: 119 - 110 = +9
- Total precipitation observed: 384 vs. theoretical total: 385 (since 275+110=385)
Probabilities to compare:
- Theoretical: P( ext{Yes} ext{ | Weekday}) = rac{275}{500} = 0.55, P( ext{Yes} ext{ | Weekend}) = rac{110}{200} = 0.55
- Observed: rac{265}{500} = 0.53 for Weekday, rac{119}{200} = 0.595 for Weekend
Key question: Are these observed differences due to natural variation or do they indicate dependence between day type and precipitation?
Conceptual model and visualization:
- Mosaic plots help assess independence: under independence, the height of bars (colors representing precipitation yes) are the same across day categories.
- If the observed mosaic shows a difference in the proportions, this hints at a potential association, but we need inference to decide if it’s due to chance.
Population vs. sample: the observed table is a sample from the population; we infer about whether the two variables are independent in the population from this sample.
Why this matters: even if there is a difference due to randomness, a larger or repeated sample could show similar or different patterns, which informs our conclusion about independence.
Connection to broader ideas:
- Independence testing for two categorical variables in 2x2 tables.
- The idea that even with independence, there will be sampling variability around the expected counts.
- The concept that a model (independence) is a simplification that can be tested against data.

The Mosaic Plot and How to Interpret It

Definition: A mosaic plot displays a contingency table by partitioning a rectangle into tiles whose areas are proportional to cell probabilities or counts.
Under independence: the fractions (heights/widths) corresponding to the conditional probabilities are the same across the rows and columns.
In the precipitation example: the height of the blue bar for "Yes" precipitation should be the same whether the day is Weekday or Weekend if independence holds.
If the observed plot shows different heights, that suggests dependence, but we still need a formal test to assess significance.

Real Data: Lansing Airport 700-Day Sample and Independence Assessment

Data: 700 consecutive days; 500 Weekdays, 200 Weekends.
Observed counts: Weekdays Yes = 265; Weekends Yes = 119; Weekdays No = 235; Weekends No = 81.
Observed weekend precipitation probability: rac{119}{200} = 0.595, observed weekday probability: rac{265}{500} = 0.53.
The independent model predicts constant precipitation probability across day types: p = 0.55; hence the expected weekend probability under independence is 0.55, which is close but not identical to the observed 0.595.
Question raised: Are these small differences just due to natural variation, or do they indicate a real association between day of the week and precipitation?
Concept of sampling variability: different samples of 700 days would yield somewhat different counts; a single sample does not prove independence or dependence.
Next step: use simulations to assess how extreme the observed difference is under the independence model.

The Role of Simulation in Testing Independence

Purpose: to determine whether the observed difference could plausibly arise by chance under the independence model.
Approach (conceptual): generate many simulated contingency tables under independence (i.e., with the same row/column totals but with precipitation status allocated independently of day type), and see how often the simulated difference in proportions is as large as the observed difference.
Mosaic-plot intuition: the observed difference in the real data is compared to the distribution of differences produced by the independence model.
Key idea: the null hypothesis is that the two variables are independent; the alternative is that they are not.

Memory Experiment: Sleep vs. Caffeine and Memory Performance

Experimental design: 24 students randomly assigned to two groups of 12 each.
- Group 1 (Sleep): group slept for 1.5 hours.
- Group 2 (Caffeine): group stayed awake and received a caffeine pill.
Outcome: whether each student scored above 60% on a memory test.
- Sleep group: 7 of 12 scored >60%
- Caffeine group: 3 of 12 scored >60%
Observed difference in rates:
- Sleep: rac{7}{12} \approx 0.583
- Caffeine: rac{3}{12} = 0.25
- Difference: rac{7}{12} - rac{3}{12} = rac{4}{12} = 0.333… \approx 0.33
They phrase the difference as about 0.33, suggesting stronger memory performance with sleep than with caffeine.
Goal: assess whether this observed difference could occur by chance if assignment to sleep vs caffeine has no effect on memory performance.
Simulation setup (permutation test framework):
- Keep the total number of “successes” (scores >60%) fixed at 10 out of 24, regardless of group.
- Randomly reassign who is in Sleep vs Caffeine groups (12 in each) and recalculate the difference in success rates for each simulated dataset.
- Repeat for many simulations (e.g., 150 simulations in the example).
- Build the null distribution of the difference in proportions under the independence model.
- Compute the p-value as the proportion of simulated differences that are as extreme or more extreme than the observed difference (0.33).
Visualization idea: use colored sticks to represent group assignments and outcomes in the simulation, illustrating how the permutation distributes under the null.
Interpretation: if the p-value is small, we reject independence (that group assignment has no effect on memory performance) at the chosen significance level.

Hypotheses and Testing Procedure (General Framework)

State the hypotheses:
- Null hypothesis (H0): The two variables are independent (no association). In experiments, this often translates to “treatment has no effect.”
- Alternative hypothesis (Ha): The two variables are not independent (there is an association).
Use the sample to compute an observed statistic that measures the strength of association (e.g., difference in proportions for a 2x2 table).
Generate the null distribution under the independence model via simulations or permutations.
Compare the observed statistic to the null distribution to obtain a p-value:
- p ext{-value} = \frac{ ext{number of simulations with statistic as extreme or more extreme than observed}}{ ext{total number of simulations}}
Decision rule: if the p-value is below the chosen significance level (e.g., \alpha=0.05), reject H0; otherwise, do not reject H0.
Important caveats:
- The null is a model; it is a simplifying assumption that can be false in the population.
- Differences in sample can arise due to natural variation even when the null is true.
- Real-world factors (confounding variables) can produce apparent associations even when the primary relationship is absent.

Extensions and Real-World Cautions

Confounding and context: Example given about past grades across different course types (e.g., advanced course vs English students in Peru) showing a reported 25.5% higher past grades difference; warns that observed differences may reflect underlying factors other than the treatment or primary variable.
Practical implications: When evaluating effects (e.g., caffeine vs sleep on memory, or survey participation influenced by incentives), randomization and permutation-based inference help separate true effects from random fluctuations.
Measurement and sampling concerns:
- The samples we study are samples of the population; repeated samples can yield different results due to random variation.
- An independence model is not proof; it is a model whose fit is assessed through likelihood of observed data under the model.
Real-world experimental design considerations mentioned in the transcript:
- Studies move from observational ideas (contingency tables) toward controlled experiments with randomization to infer causal effects.
- Examples include caffeine and memory, surveys with gift cards to increase participation, and other related questions where independence testing and inference play a key role.
Summary practical takeaway:
- To assess independence between two categorical variables, compare observed contingency tables with the expected counts under independence, often visualized via mosaic plots.
- When obvious deviations exist, use simulation-based or permutation-based tests to quantify the likelihood that such deviations occur under independence.
- Always consider alternative explanations such as sampling variability, confounding factors, and real-world context before concluding a true association.