Lecture 9.1 -8 two categorical variables

Lecture Overview

  • Course Information

    • Sociology/Anthropology 10B

    • Professor David Schaefer, University of California, Irvine

    • Protected content (© 2023)

9.1 Overview of Association Between Two Categorical Variables

  • Reading: Chapter 8, pages 215 – 233, 238 – 239

Key Concepts in Association Analysis

  • Association:

    • Exists if conditional distributions of one variable differ across categories of another variable.

    • Indicates that specific values of one variable correlate with value distributions of another.

  • Response vs. Explanatory Variables:

    • Focus on response (dependent) variable distributions across explanatory (independent) categories.

Conditional Distribution Examples

  • Example 1: Opinion on Abortion by Political Party

    • Democrats more likely to say yes, Republicans more likely to say no.

  • Example 2: Lack of Association

    • Opinion on abortion is consistent across genders, indicating statistical independence.

Contingency Tables

  • Purpose:

    • Summarize joint distributions and conditional percentages.

  • Calculating Percentages:

    • Focus on response variable distributions within explanatory categories.

Chi-Square Statistic

  • Purpose: Quantifies association between variables.

  • Statistical Independence:

    • Null Hypothesis: Variables are independent.

    • Alternative Hypothesis: Variables are associated.

    • Rejecting Null: Indicates detected association.

Conditional Distributions and Responses

  • Conditional distributions of education level by sex.

    • Contingency table showcases frequencies between genders and educational attainment:

      • Males: 736 with less than a degree; 316 with a degree.

      • Females: 900 with less than a degree; 396 with a degree.

  • Marginal distributions calculated from rows and columns.

Chi-Square Test Criteria

  • Assumptions:

    • Data comprises two categorical variables.

    • Random sampling established.

    • Each cell in contingency table has an expected frequency (fe) > 5.

  • Hypotheses:

    • Null: Education and sex are independent.

    • Alternative: Education and sex are not independent.

  • Test Statistic:

    • Chi-square calculation: c² = Σ( (fo - fe)²/ fe ).

Residual Analysis

  • Residual: Difference between observed (fo) and expected (fe) counts.

    • Positive values suggest higher observed frequencies; negative suggests lower.

  • Standardized Residuals:

    • Allows for comparison across cells of different sizes. Highly relevant for identifying significant deviations.

Example Analysis

  • Political view and spending opinion survey:

    • Political Views: Ranged from extremely liberal to extremely conservative.

    • Spending: Categories from "spend much more" to "spend much less."

    • Analysis leads to visualization through bar charts showing conditional distributions.

Interpretation of Chi-Square Results

  • Underlying Meaning of P-value:

    • Low P-value (<0.05) indicates strong evidence against null hypothesis.

    • Attributes significant relationship between variables in question.

Recap of Key Findings

  • Education and sex do not exhibit a significant association according to analyzed data.

  • Observed patterns illustrate important nuances in socio-demographic relationships.