Joint and Conditional Frequencies in Categorical Data (Optometry Example)

Key concepts from the transcript

Data visualization vs categorization
- You can plot category data with either frequency or relative frequency on the y-axis (or vice versa). The idea is that the axes can be switched depending on what you want to emphasize (counts vs proportions).
- Joint frequency data are often presented in a contingency table or plotted as a chart where each cell corresponds to a combination of categories.
Joint frequency vs relative frequency
- Joint frequency: the count of observations in a given combination of categories, denoted by n_{ij} for category i in one variable and category j in another.
- Relative frequency: the proportion of the total observations that fall into a given cell, denoted by p{ij} = \frac{n{ij}}{N}, where N is the total number of observations.
- In the example, a total N = 64 is referenced, with counts distributed across categories (e.g., males vs females, nearsighted vs farsighted).
Conditioning on a category (conditional distribution)
- When you condition on a category, that category becomes the denominator.
- The idea is to examine subgroups within a larger category: divide the cell frequency by the marginal total for the conditioning category.
- Formula for conditional probability (in frequency terms):
 $P(A\mid B) = \frac{n{AB}}{nB}$
 where n{AB} is the joint frequency of A and B, and nB is the marginal total for B.
- This produces a distribution within the conditioned category (e.g., within nearsighted individuals, what is the breakdown by gender?).
Notation and setup for a 2×2 example
- Variables (example from an optometry/shop data):
- Vision status: Nearest (Nearsighted) vs Farsighted
- Gender: Male vs Female
- Joint frequencies (example counts that align with the transcript’s framing of N = 64, M = 32, F = 32):
- n_{11} = 25 = Nearsighted \& Male
- n_{12} = 15 = Nearsighted \& Female
- n_{21} = 7 = Farsighted \& Male
- n_{22} = 17 = Farsighted \& Female
- Total observations: $N = 64$
- Marginal totals:
- Nearsighted total: $n{\text{Nea}} = n{11} + n_{12} = 40$
- Farsighted total: $n{\text{Far}} = n{21} + n_{22} = 24$
- Male total: $n{\text{Male}} = n{11} + n_{21} = 32$
- Female total: $n{\text{Female}} = n{12} + n_{22} = 32$
Calculations with the 2×2 example
- Joint frequencies (already given above):
- Nearsighted & Male: $n_{11} = 25$
- Nearsighted & Female: $n_{12} = 15$
- Farsighted & Male: $n_{21} = 7$
- Farsighted & Female: $n_{22} = 17$
- Relative frequencies (proportions of the total N):
- $p{11} = \frac{n{11}}{N} = \frac{25}{64}$
- $p{12} = \frac{n{12}}{N} = \frac{15}{64}$
- $p{21} = \frac{n{21}}{N} = \frac{7}{64}$
- $p{22} = \frac{n{22}}{N} = \frac{17}{64}$
- Marginal totals (as above): Nearsighted = 40, Farsighted = 24, Male = 32, Female = 32
- Conditional distributions (examples):
- Within Nearsighted:
 - $P(Male\mid Nearsighted) = \frac{n{11}}{n{\text{Nea}}} = \frac{25}{40} = 0.625$
 - $P(Female\mid Nearsighted) = \frac{n{12}}{n{\text{Nea}}} = \frac{15}{40} = 0.375$
- Within Male:
 - $P(Nearsighted\mid Male) = \frac{n{11}}{n{\text{Male}}} = \frac{25}{32} \approx 0.78125$
 - $P(Farsighted\mid Male) = \frac{n{21}}{n{\text{Male}}} = \frac{7}{32} \approx 0.21875$
- Within Female:
 - $P(Nearsighted\mid Female) = \frac{n{12}}{n{\text{Female}}} = \frac{15}{32} = 0.46875$
 - $P(Farsighted\mid Female) = \frac{n{22}}{n{\text{Female}}} = \frac{17}{32} = 0.53125$
Practice problem approach (as described in the transcript)
- Problem: At an optometry shop, data are collected from customers. What percent of all customers are farsighted?
- Steps:
- Identify the total number of farsighted customers (n_{\text{Far}}) and the total number of customers (N).
- Compute the percentage: $\text{Percent farsighted} = \frac{n_{\text{Far}}}{N} \times 100\%$
- If you condition on nearsighted people (i.e., focus only on the nearsighted group):
- Use the conditional formula to determine subcategory counts or proportions within the nearsighted group:
 - Example within Nearsighted: $P(A\mid Nearsighted) = \frac{n{A\,\text{Nea}}}{n{\text{Nea}}}$
- The denominator becomes the marginal total for the conditioning category (in this case, the total nearsighted individuals).
- The same framework applies to any pair of categorical variables (e.g., vision status × gender, or vision status × age group).
Connections to foundational principles
- These concepts rely on basic probability and counting: joint, marginal, and conditional distributions.
- If P(A) = P(A|B) for all B in the domain of B, then A is independent of B. This is a fundamental check you can use in real data.
- Base-rate awareness: conditional probabilities depend on the conditioning category; misinterpreting conditioning can lead to base-rate fallacies.
Practical and ethical considerations
- When presenting joint/conditional frequencies or proportions, clearly label what the denominator represents to avoid misinterpretation.
- Always report both absolute counts (n{ij}) and proportions (p{ij}) for transparency.
- Consider the context (e.g., an optometry shop) to avoid misrepresenting subgroups or under/over-emphasizing certain categories.
Summary of takeaways
- Joint frequency shows how many observations fall into each combination of categories: n_{ij}.
- Relative frequency shows the proportion of the total in each cell: p{ij} = n{ij}/N.
- Conditioning on a category uses that category as the denominator: P(A|B) = n{AB}/nB.
- Marginal totals (n{i•}, n{•j}) are essential for computing both conditional distributions and overall proportions.
- A practical example (N = 64, Nearsighted vs Farsighted × Male vs Female) yields concrete numbers to compute joint, marginal, and conditional frequencies and probabilities.