Lecture 9.1 -8 two categorical variables
Lecture Overview
Course Information
Sociology/Anthropology 10B
Professor David Schaefer, University of California, Irvine
Protected content (© 2023)
9.1 Overview of Association Between Two Categorical Variables
Reading: Chapter 8, pages 215 – 233, 238 – 239
Key Concepts in Association Analysis
Association:
Exists if conditional distributions of one variable differ across categories of another variable.
Indicates that specific values of one variable correlate with value distributions of another.
Response vs. Explanatory Variables:
Focus on response (dependent) variable distributions across explanatory (independent) categories.
Conditional Distribution Examples
Example 1: Opinion on Abortion by Political Party
Democrats more likely to say yes, Republicans more likely to say no.
Example 2: Lack of Association
Opinion on abortion is consistent across genders, indicating statistical independence.
Contingency Tables
Purpose:
Summarize joint distributions and conditional percentages.
Calculating Percentages:
Focus on response variable distributions within explanatory categories.
Chi-Square Statistic
Purpose: Quantifies association between variables.
Statistical Independence:
Null Hypothesis: Variables are independent.
Alternative Hypothesis: Variables are associated.
Rejecting Null: Indicates detected association.
Conditional Distributions and Responses
Conditional distributions of education level by sex.
Contingency table showcases frequencies between genders and educational attainment:
Males: 736 with less than a degree; 316 with a degree.
Females: 900 with less than a degree; 396 with a degree.
Marginal distributions calculated from rows and columns.
Chi-Square Test Criteria
Assumptions:
Data comprises two categorical variables.
Random sampling established.
Each cell in contingency table has an expected frequency (fe) > 5.
Hypotheses:
Null: Education and sex are independent.
Alternative: Education and sex are not independent.
Test Statistic:
Chi-square calculation: c² = Σ( (fo - fe)²/ fe ).
Residual Analysis
Residual: Difference between observed (fo) and expected (fe) counts.
Positive values suggest higher observed frequencies; negative suggests lower.
Standardized Residuals:
Allows for comparison across cells of different sizes. Highly relevant for identifying significant deviations.
Example Analysis
Political view and spending opinion survey:
Political Views: Ranged from extremely liberal to extremely conservative.
Spending: Categories from "spend much more" to "spend much less."
Analysis leads to visualization through bar charts showing conditional distributions.
Interpretation of Chi-Square Results
Underlying Meaning of P-value:
Low P-value (<0.05) indicates strong evidence against null hypothesis.
Attributes significant relationship between variables in question.
Recap of Key Findings
Education and sex do not exhibit a significant association according to analyzed data.
Observed patterns illustrate important nuances in socio-demographic relationships.