Joint and Conditional Frequencies in Categorical Data (Optometry Example)

Key concepts from the transcript

  • Data visualization vs categorization

    • You can plot category data with either frequency or relative frequency on the y-axis (or vice versa). The idea is that the axes can be switched depending on what you want to emphasize (counts vs proportions).
    • Joint frequency data are often presented in a contingency table or plotted as a chart where each cell corresponds to a combination of categories.
  • Joint frequency vs relative frequency

    • Joint frequency: the count of observations in a given combination of categories, denoted by n_{ij} for category i in one variable and category j in another.
    • Relative frequency: the proportion of the total observations that fall into a given cell, denoted by p{ij} = \frac{n{ij}}{N}, where N is the total number of observations.
    • In the example, a total N = 64 is referenced, with counts distributed across categories (e.g., males vs females, nearsighted vs farsighted).
  • Conditioning on a category (conditional distribution)

    • When you condition on a category, that category becomes the denominator.
    • The idea is to examine subgroups within a larger category: divide the cell frequency by the marginal total for the conditioning category.
    • Formula for conditional probability (in frequency terms):
      P(A\mid B) = \frac{n{AB}}{nB}
      where n{AB} is the joint frequency of A and B, and nB is the marginal total for B.
    • This produces a distribution within the conditioned category (e.g., within nearsighted individuals, what is the breakdown by gender?).
  • Notation and setup for a 2×2 example

    • Variables (example from an optometry/shop data):
    • Vision status: Nearest (Nearsighted) vs Farsighted
    • Gender: Male vs Female
    • Joint frequencies (example counts that align with the transcript’s framing of N = 64, M = 32, F = 32):
    • n_{11} = 25 = Nearsighted \& Male
    • n_{12} = 15 = Nearsighted \& Female
    • n_{21} = 7 = Farsighted \& Male
    • n_{22} = 17 = Farsighted \& Female
    • Total observations: N = 64
    • Marginal totals:
    • Nearsighted total: n{\text{Nea}} = n{11} + n_{12} = 40
    • Farsighted total: n{\text{Far}} = n{21} + n_{22} = 24
    • Male total: n{\text{Male}} = n{11} + n_{21} = 32
    • Female total: n{\text{Female}} = n{12} + n_{22} = 32
  • Calculations with the 2×2 example

    • Joint frequencies (already given above):
    • Nearsighted & Male: n_{11} = 25
    • Nearsighted & Female: n_{12} = 15
    • Farsighted & Male: n_{21} = 7
    • Farsighted & Female: n_{22} = 17
    • Relative frequencies (proportions of the total N):
    • p{11} = \frac{n{11}}{N} = \frac{25}{64}
    • p{12} = \frac{n{12}}{N} = \frac{15}{64}
    • p{21} = \frac{n{21}}{N} = \frac{7}{64}
    • p{22} = \frac{n{22}}{N} = \frac{17}{64}
    • Marginal totals (as above): Nearsighted = 40, Farsighted = 24, Male = 32, Female = 32
    • Conditional distributions (examples):
    • Within Nearsighted:
      • P(Male\mid Nearsighted) = \frac{n{11}}{n{\text{Nea}}} = \frac{25}{40} = 0.625
      • P(Female\mid Nearsighted) = \frac{n{12}}{n{\text{Nea}}} = \frac{15}{40} = 0.375
    • Within Male:
      • P(Nearsighted\mid Male) = \frac{n{11}}{n{\text{Male}}} = \frac{25}{32} \approx 0.78125
      • P(Farsighted\mid Male) = \frac{n{21}}{n{\text{Male}}} = \frac{7}{32} \approx 0.21875
    • Within Female:
      • P(Nearsighted\mid Female) = \frac{n{12}}{n{\text{Female}}} = \frac{15}{32} = 0.46875
      • P(Farsighted\mid Female) = \frac{n{22}}{n{\text{Female}}} = \frac{17}{32} = 0.53125
  • Practice problem approach (as described in the transcript)

    • Problem: At an optometry shop, data are collected from customers. What percent of all customers are farsighted?
    • Steps:
    • Identify the total number of farsighted customers (n_{\text{Far}}) and the total number of customers (N).
    • Compute the percentage: \text{Percent farsighted} = \frac{n_{\text{Far}}}{N} \times 100\%
    • If you condition on nearsighted people (i.e., focus only on the nearsighted group):
    • Use the conditional formula to determine subcategory counts or proportions within the nearsighted group:
      • Example within Nearsighted: P(A\mid Nearsighted) = \frac{n{A\,\text{Nea}}}{n{\text{Nea}}}
    • The denominator becomes the marginal total for the conditioning category (in this case, the total nearsighted individuals).
    • The same framework applies to any pair of categorical variables (e.g., vision status × gender, or vision status × age group).
  • Connections to foundational principles

    • These concepts rely on basic probability and counting: joint, marginal, and conditional distributions.
    • If P(A) = P(A|B) for all B in the domain of B, then A is independent of B. This is a fundamental check you can use in real data.
    • Base-rate awareness: conditional probabilities depend on the conditioning category; misinterpreting conditioning can lead to base-rate fallacies.
  • Practical and ethical considerations

    • When presenting joint/conditional frequencies or proportions, clearly label what the denominator represents to avoid misinterpretation.
    • Always report both absolute counts (n{ij}) and proportions (p{ij}) for transparency.
    • Consider the context (e.g., an optometry shop) to avoid misrepresenting subgroups or under/over-emphasizing certain categories.
  • Summary of takeaways

    • Joint frequency shows how many observations fall into each combination of categories: n_{ij}.
    • Relative frequency shows the proportion of the total in each cell: p{ij} = n{ij}/N.
    • Conditioning on a category uses that category as the denominator: P(A|B) = n{AB}/nB.
    • Marginal totals (n{i•}, n{•j}) are essential for computing both conditional distributions and overall proportions.
    • A practical example (N = 64, Nearsighted vs Farsighted × Male vs Female) yields concrete numbers to compute joint, marginal, and conditional frequencies and probabilities.