Race and ethnicity in empirical analysis.pdf

Five Key Points about Race and Ethnicity in Empirical Analysis

Racial and ethnic categorizations are problematic but necessary for consistent comparative analysis over time.
- Most data use the federal government’s classifications (OMB) to enable generalizability across time and sources.
- Time-invariant or consistent categorization is important to trace changes in groups across datasets and periods.
Average or median measures by race are common but not always informative enough.
- Descriptive analysis (e.g., mean unemployment rate by race) informs policy but may hide important variation.
- Distributional analyses (medians, 90th percentile, etc.) provide more information about economic well-being and reflect heterogeneity within groups.
Regression analysis adds structure but requires careful interpretation.
- Regression controls for other factors and isolates associations between race and outcomes, but the race coefficient’s meaning depends on included controls and unobserved factors.
- Coefficients on race in a regression are typically a group-average effect conditional on other variables, not necessarily a direct measure of discrimination on individuals.
How we interpret the race variable shapes policy design.
- Is race a proxy for individual characteristics, or a marker of group-specific structural processes affecting outcomes?
- Different interpretations lead to different policy implications and remedies.
A promising approach is stratification economics, which emphasizes historical and structural context rather than purely individual deficits.
- This lens connects race to group-level processes and policy history, highlighting how resource division and policy design contribute to gaps.

Introduction: Why Researchers, Policymakers, and Advocates Care about the Race Variable

The essay targets researchers seeking more equitable interpretation of race/ethnicity in quantitative analysis, plus policymakers and advocates using research for practice.
It outlines problems posed by race definitions in federal data, how race shows up in quantitative analyses, and the limits of interpreting race coefficients.
The goal is to contextualize findings to inform policy that addresses racial disparities more effectively.

Racial and Ethnic Categorizations: Problematic but Necessary for Cross-Time Analysis

The Office of Management and Budget (OMB) classifications guide federal agencies (and the Census) for race.
- Current categories: White; Black or African American; Asian; Native American, American Indian, or Alaska Native; Native Hawaiian or Other Pacific Islander; plus the “some other race” category in some cases.
- Ethnicity categories: Hispanic and non-Hispanic.
- Respondents may report membership in more than one race (multi-racial reporting became allowed with the 2000 Census).
Racial data are collected for a minimum of five groups by the OMB: White; Black or African American; Native American, American Indian, or Alaska Native; Asian; Native Hawaiian or Other Pacific Islander; with a sixth category, “some other race,” permitted in certain contexts.
Race self-identification has largely replaced surveyor-determined race since the census era, but categorizations remain lumped, overlapping, and time-varying.
Why keep these categories? To enable cross-time comparability and generalizable claims across data sources, despite their flaws.
Historical changes in categories over time (illustrative timeline):
- 1850: only White and nonwhite categories; nonwhites often treated as Black by default.
- 1860: Native American added.
- 1870: Chinese added.
- 1890: Japanese added.
- 1920: Filipino, Korean, and Hindu added.
- 1930: Mexican added.
- 2000: Asian or Pacific Islander split into Asian and Native Hawaiian or Other Pacific Islander; multi-racial reporting allowed.
Practical implication: Using these standard categories supports comparability, though they obscure within-group heterogeneity (see examples below).
A note on data collection: race/ethnicity classifications come from federal definitions and are frequently echoed in state/local data and administrative data (e.g., unemployment insurance systems).
Example of within-race heterogeneity: the Asian category contains diverse subgroups with distinct histories and outcomes (e.g., Hmong Americans vs. South Asian Americans), which can have very different poverty rates and geographic concentrations.

Why Averages Can Be Misleading: Within-Group Heterogeneity and Geographic Effects

Averages (means) by race are common descriptors but hide important variation within groups.
- Example: Among Asians, Hmong Americans in Minneapolis–St. Paul have poverty rates about twice the national Asian average and much higher than some other Asian subgroups.
- Heterogeneity exists within racial categories, driven by historical context, immigration status, geography, and local policy environments.
Geographic concentration can make a group appear wealthier or poorer due to location-based price levels, housing markets, and local economic conditions.
- To compare fairly, analysts should hold exogenous geographic factors constant or explicitly model geography as a covariate.
Within-group heterogeneity matters for policy: lumping diverse subgroups into a single category can mislead the design of interventions intended to address disparities.
Example: Wealth gaps across races may persist even after accounting for education, indicating that education alone does not mediate wealth disparities; highly educated Whites and Blacks can still have substantial wealth gaps (e.g., Hamilton et al. 2018).
Distributional measures can be more informative than the mean when assessing inequality (e.g., comparing medians or tails of the distribution by race).

Descriptive vs Regression Analysis: What the Race Coefficient Really Represents

Descriptive analysis (average/median by race) describes populations but does not identify causal mechanisms or the factors that drive disparities.
Regression analysis introduces controls and can reveal associations between race and outcomes while adjusting for other factors (education, experience, etc.).
- The race coefficient in a regression is conditional on the included controls; unobserved factors may still bias the estimate.
- Important nuance: The coefficient on race in a standard regression is often interpreted as an average effect for the group, holding other controls constant, rather than a direct measure of discrimination.
The problem of omitted variables:
- Unobserved factors (e.g., noncognitive skills, cognitive skills, school quality, neighborhood effects, cumulative disadvantage) may be correlated with race and influence outcomes, biasing the race coefficient.
- If such factors are relevant and correlated with race, the race coefficient may partly capture these omitted influences.
The traditional expectation that the race coefficient b1 would be zero after controlling for all relevant inputs is challenged by empirical evidence and theory, due to unobserved heterogeneity and structural factors.
The regression framework can be extended to examine how the impact of other variables (e.g., education) may differ by race, i.e., interaction effects.

Regression Specifications that Highlight Race Differences: Interactions and Slopes

Standard linear regression model (example):
- y = a + b1 x1 + b2 x2 + \cdots + bn xn + e where e is the error term.
- Here, x1 could be a race indicator (e.g., Black = 1, others = 0) and x2,…,xn are other controls (education, experience, etc.).
- In this specification, b1 is the average difference in y for the race group holding other factors constant (an intercept shift).
Interacted (slopes) model to capture differential return to education by race:
- y = a + b1 x1 + b2 x2 + \cdots + bn xn + c1 (x1 \times m) + e
- Where x1 is race and m could be a moderator (e.g., education level). The term c1 (x1 \times m) allows the effect of education to vary by race.
Interpretation of coefficients in an interaction model:
- The b’s: intercept differences for race controlling for other variables.
- The c’s: slope differences—how the effect of the moderator (e.g., education) on the outcome changes by race.
Example interpretation for wages and education:
- If education is positively related to wages and race is negatively related to wages, we might be interested in the wage gradient by race: how much does each additional year of education increase wages for a Black worker compared to a White worker?
- This is captured by the interaction term c1 (Race × Education).
Traditional interpretation of race coefficients and the move away from simple discrimination claims:
- In classic models, one might expect the race coefficient to be near zero after controlling for productivity-related factors, implying little residual discrimination.
- However, empirical evidence rarely supports b1 ≈ 0, and many factors related to wages are unobserved (noncognitive skills, cognitive skills, neighborhood effects, school quality, etc.).
Why race coefficients are not definitive evidence of discrimination in modern analyses:
- Omitted factors that co-vary with race can drive the observed gaps if not included in the model.
- The interpretation depends on what is included as controls and how the model is specified.
The approach in economics has shifted away from treating the race coefficient as direct evidence of discrimination because
- It is difficult to measure all relevant factors, and omitted factors could be correlated with race,
- It is challenging to separate group-level processes from individual-level attributes in a single-equation regression.

Interpreting the Coefficient: Discrimination, Omitted Factors, and Group Processes

If a model excludes relevant factors (e.g., quality of schooling, neighborhood characteristics, access to resources), the race coefficient may capture these omitted influences rather than pure discrimination.
To better understand the sources of gaps, researchers can decompose the race effect into components related to systemic factors and those not directly tied to race-equated processes.
A key challenge: when using individual-level regressions, attributing group effects to group-level processes requires careful justification that omitted factors would be correlated with race in the same way across individuals.
The critique of interpreting race effects as solely due to individual deficits: it overlooks structural and policy-driven factors that shape group outcomes over time.
Practical implication for policy design: policies should address the structural and group-level drivers of disparities (e.g., education quality, regional investment), rather than assuming the gap is purely due to individual shortfalls.

Stratification Economics: A Group-Specific Approach to Race and Outcomes

Stratification economics (pioneered by William A. Darity Jr., Darrick Hamilton, James B. Stewart, and others) reframes race-based outcomes as the product of group-specific processes.
Core idea: racial gaps in outcomes arise from how resources and opportunities have been historically allocated and how policy structures continue to shape those allocations.
This approach explicitly ties gap dynamics to policy history and current practices that differentially affect groups, rather than attributing gaps to deficits of individuals within groups.
Implications of stratification economics:
- Race is meaningful as a marker of group-specific resource division and policy architecture, not just an attribute of individuals.
- The race variable matters because it captures how historical and current policies shape group outcomes.
This framework helps address the problem of omitting variables by recognizing that the omitted factors themselves are systemic and tied to policy and history rather than simply random unobservables.
The broader goal: integrate empirical analysis with theoretical development to model outcomes as group-specific processes and to design policy that addresses structural factors.

How Race Definitions and Constructions Affect Policy and Analysis

The race variable should be treated with caution in policy-relevant analysis:
- Define and construct the variable with awareness of what it represents (individual trait vs. group-process marker).
- Consider both aggregation (group averages) and distributional aspects (within-group heterogeneity, medians, and tail outcomes).
The central questions for researchers and policymakers:
- What do we want to measure: average group gaps, distributional gaps, or effects of specific mediators like education quality?
- How should policy target the drivers of disparities: individual skills, geographic factors, or systemic policy design that affects entire groups?
The framing matters: thinking of outcomes by race as a group-specific process (stratification) rather than as the average of individuals can shift policy priorities toward addressing structural barriers.

Practical Implications for Policy Design

If race differences reflect group-specific processes shaped by policy, then policy interventions should target those processes (e.g., improving school quality in under-resourced districts, reducing geographic inequality, ensuring equitable access to high-opportunity neighborhoods).
If the gap is driven by unobserved, correlated factors, policies should focus on measuring and mitigating those factors (e.g., improving data collection on noncognitive skills, neighborhood conditions, and school funding mechanisms).
Policies should consider both descriptive gaps and distributional patterns (e.g., median vs mean, concentration of outcomes at the tails) to understand who is most affected and where interventions will have the largest impact.
The ethical and practical implications of interpreting race in research include avoiding blame on individuals of a race for group-level disparities and recognizing the role of structural drivers and policy history in shaping outcomes.

Additional Readings and Resources (as Suggested by the Author)

Articles:
- Cook, L. D., T. D. Logan, and J. M. Parman. 2014. Distinctively Black Names in the American Past. Explorations in Economic History 53, 64–82.
- Gullickson, A. 2019. The Racial Identification of Young Adults in a Racially Complex Society. Emerging Adulthood 7(2): 150–161.
- Gullickson, A., and A. Morning. 2011. Choosing Race: Multiracial Ancestry and Identification. Social Science Research 40(2): 498–512.
- Ward, Z. 2021. Intergenerational Mobility in American History: Accounting for Race and Measurement Error. NBER Working Paper no. w29256.
Books:
- Dietz, T., and L. Kalof. 2010. Introduction to Social Statistics: The Logic of Statistical Reasoning. Wiley-Blackwell.
- Frankfort-Nachmias, C., A. Leon-Guerrero, and G. Davis. 2020. Social Statistics for a Diverse Society, 9th ed. SAGE.
- Roberts, D. 2011. Fatal Invention: How Science, Politics, and Big Business Re-Create Race in the Twenty-First Century. New Press/ORIM.
Subject matter experts involved in this work:
- William A. Darity Jr. • Duke University
- Aaron Gullickson • University of Oregon
- Hedwig Lee • Washington University in St. Louis
- Dorothy Roberts • University of Pennsylvania
References cited in the text include Budiman (2021), Darity et al. (2015), Hamilton et al. (2018), and Minnesota Historical Society (2014).

Summary Takeaways

Race is a necessary but flawed construct for empirical analysis; consistent measurement over time is essential for comparability.
Descriptive analyses are informative but limited; regression analyses offer deeper insights yet require careful interpretation due to potential omitted variables and the meaning of race coefficients.
Interactions between race and other factors (e.g., education) reveal how returns to inputs can differ by race, highlighting the importance of studying slope differences, not just intercept shifts.
The traditional aim of attributing unexplained gaps to discrimination is increasingly viewed as insufficient; a stratification economics lens emphasizes structural, policy-driven group processes as drivers of disparities.
Policy design should be informed by a nuanced interpretation of race in empirical analyses, with attention to heterogeneity, geography, and the historical policy context to create targeted, effective interventions.