Joint and Conditional Frequencies in Categorical Data (Optometry Example)
Key concepts from the transcript
Data visualization vs categorization
- You can plot category data with either frequency or relative frequency on the y-axis (or vice versa). The idea is that the axes can be switched depending on what you want to emphasize (counts vs proportions).
- Joint frequency data are often presented in a contingency table or plotted as a chart where each cell corresponds to a combination of categories.
Joint frequency vs relative frequency
- Joint frequency: the count of observations in a given combination of categories, denoted by n_{ij} for category i in one variable and category j in another.
- Relative frequency: the proportion of the total observations that fall into a given cell, denoted by p{ij} = \frac{n{ij}}{N}, where N is the total number of observations.
- In the example, a total N = 64 is referenced, with counts distributed across categories (e.g., males vs females, nearsighted vs farsighted).
Conditioning on a category (conditional distribution)
- When you condition on a category, that category becomes the denominator.
- The idea is to examine subgroups within a larger category: divide the cell frequency by the marginal total for the conditioning category.
- Formula for conditional probability (in frequency terms):
P(A\mid B) = \frac{n{AB}}{nB}
where n{AB} is the joint frequency of A and B, and nB is the marginal total for B. - This produces a distribution within the conditioned category (e.g., within nearsighted individuals, what is the breakdown by gender?).
Notation and setup for a 2×2 example
- Variables (example from an optometry/shop data):
- Vision status: Nearest (Nearsighted) vs Farsighted
- Gender: Male vs Female
- Joint frequencies (example counts that align with the transcript’s framing of N = 64, M = 32, F = 32):
- n_{11} = 25 = Nearsighted \& Male
- n_{12} = 15 = Nearsighted \& Female
- n_{21} = 7 = Farsighted \& Male
- n_{22} = 17 = Farsighted \& Female
- Total observations: N = 64
- Marginal totals:
- Nearsighted total: n{\text{Nea}} = n{11} + n_{12} = 40
- Farsighted total: n{\text{Far}} = n{21} + n_{22} = 24
- Male total: n{\text{Male}} = n{11} + n_{21} = 32
- Female total: n{\text{Female}} = n{12} + n_{22} = 32
Calculations with the 2×2 example
- Joint frequencies (already given above):
- Nearsighted & Male: n_{11} = 25
- Nearsighted & Female: n_{12} = 15
- Farsighted & Male: n_{21} = 7
- Farsighted & Female: n_{22} = 17
- Relative frequencies (proportions of the total N):
- p{11} = \frac{n{11}}{N} = \frac{25}{64}
- p{12} = \frac{n{12}}{N} = \frac{15}{64}
- p{21} = \frac{n{21}}{N} = \frac{7}{64}
- p{22} = \frac{n{22}}{N} = \frac{17}{64}
- Marginal totals (as above): Nearsighted = 40, Farsighted = 24, Male = 32, Female = 32
- Conditional distributions (examples):
- Within Nearsighted:
- P(Male\mid Nearsighted) = \frac{n{11}}{n{\text{Nea}}} = \frac{25}{40} = 0.625
- P(Female\mid Nearsighted) = \frac{n{12}}{n{\text{Nea}}} = \frac{15}{40} = 0.375
- Within Male:
- P(Nearsighted\mid Male) = \frac{n{11}}{n{\text{Male}}} = \frac{25}{32} \approx 0.78125
- P(Farsighted\mid Male) = \frac{n{21}}{n{\text{Male}}} = \frac{7}{32} \approx 0.21875
- Within Female:
- P(Nearsighted\mid Female) = \frac{n{12}}{n{\text{Female}}} = \frac{15}{32} = 0.46875
- P(Farsighted\mid Female) = \frac{n{22}}{n{\text{Female}}} = \frac{17}{32} = 0.53125
Practice problem approach (as described in the transcript)
- Problem: At an optometry shop, data are collected from customers. What percent of all customers are farsighted?
- Steps:
- Identify the total number of farsighted customers (n_{\text{Far}}) and the total number of customers (N).
- Compute the percentage: \text{Percent farsighted} = \frac{n_{\text{Far}}}{N} \times 100\%
- If you condition on nearsighted people (i.e., focus only on the nearsighted group):
- Use the conditional formula to determine subcategory counts or proportions within the nearsighted group:
- Example within Nearsighted: P(A\mid Nearsighted) = \frac{n{A\,\text{Nea}}}{n{\text{Nea}}}
- The denominator becomes the marginal total for the conditioning category (in this case, the total nearsighted individuals).
- The same framework applies to any pair of categorical variables (e.g., vision status × gender, or vision status × age group).
Connections to foundational principles
- These concepts rely on basic probability and counting: joint, marginal, and conditional distributions.
- If P(A) = P(A|B) for all B in the domain of B, then A is independent of B. This is a fundamental check you can use in real data.
- Base-rate awareness: conditional probabilities depend on the conditioning category; misinterpreting conditioning can lead to base-rate fallacies.
Practical and ethical considerations
- When presenting joint/conditional frequencies or proportions, clearly label what the denominator represents to avoid misinterpretation.
- Always report both absolute counts (n{ij}) and proportions (p{ij}) for transparency.
- Consider the context (e.g., an optometry shop) to avoid misrepresenting subgroups or under/over-emphasizing certain categories.
Summary of takeaways
- Joint frequency shows how many observations fall into each combination of categories: n_{ij}.
- Relative frequency shows the proportion of the total in each cell: p{ij} = n{ij}/N.
- Conditioning on a category uses that category as the denominator: P(A|B) = n{AB}/nB.
- Marginal totals (n{i•}, n{•j}) are essential for computing both conditional distributions and overall proportions.
- A practical example (N = 64, Nearsighted vs Farsighted × Male vs Female) yields concrete numbers to compute joint, marginal, and conditional frequencies and probabilities.