SOC 3223 Lecture 5: Demography Data Sources and IPUMS/PRB/ACS (Vocabulary Flashcards)
Course and Lecture Details
- SOC 3223: Population Dynamics and Demographic Techniques
- Lecture 5
- Date: September 8, 2025
- Instructor: Rogelio Sáenz
Outline
- Recap
- Quiz 1 taking place today
- Required readings
- DEMOGRAPHY MATTERS TODAY
- Specific data sources
- American Community Survey (ACS)
- Student Hometown Demographic and Socioeconomic Data
- Population Reference Bureau (PRB)
- Integrated Public Use Microdata Series (IPUMS)
- Recap
Recap (Lecture context)
- Review of the preceding lecture on Demographic Data
- DEMOGRAPHY MATTERS TODAY: “Allentown, PA, a former industrial town reborn” (case/illustrative example)
- Sources of demographic data include:
- National censuses
- Registration systems
- Population registers
- Vital statistics
- Surveys
Don’t Forget
Required Readings
- "How will we measure the accuracy of the 2020 census?" PRB resource (link):
https://www.prb.org/resources/how-will-wemeasure-the- accuracy-of-the-2020-census/ - “How Abbott cost Texas a House seat.” (link via Express News):
https://www.expressnews.com/opinion/commentary/article/Commentary-How-Abbott-cost-Texas-a-Houseseat-16159960.php
Two decades, no headcount: How lack of actual demography is distorting Nigeria's growth (Lecture theme)
- Article: Philip Ibitoye, September 7, 2025
- Includes a map-like regional frame (focus on demographic undercount in Nigeria)
- Illustrates how lack of up-to-date demographic headcounts distorts growth assessments
- The figure shows political-administrative divisions and population concentration (context for data quality/coverage issues)
DEMOGRAPHY MATTERS TODAY (NYT summary from the reading)
- Article:
- "Is the Jobs Data Still Reliable? Yes, at Least for Now." by Ben Casselman, Sept. 5, 2025
- Context: following a weak jobs report, the President fired the head of the Bureau of Labor Statistics (BLS) and named a loyalist to run the agency’s department
- Key questions: Can this month's numbers be trusted? What safeguards exist?
- Bottom line from economists and government statistics experts:
- Yes, the data remains reliable for now, but with caveats that are always present in statistical data
- Key actors and terms:
- Erika McEntarfer (former BLS head) and E.J. Antoni (conservative economist potential appointee)
- Deputy commissioner William J. Wiatrowski (acting commissioner)
- Erica Groshen (former BLS head under Obama) described Wiatrowski as a committed
"B.L.S. lifer" with safeguards in place - Aaron Sojourner (economist) notes safeguards; data is highly automated and decentralized
- Data process and caveats:
- The monthly payroll figure relies on company-reported data through automated systems
- The commissioner does not access final numbers until after finalization; political influence is constrained by workflow
- Preliminary estimates (like the upcoming August figure) will be revised as additional data arrives
- Revisions can be substantial (e.g., May–June revisions totaling 258{,}000 jobs revised downward in a recent example)
- Historical revisions: Some downward revisions have occurred, sometimes signaling momentum changes, but not definitive signals of recession
- Experts caution: a potential risk if political actors undermine credibility over time; nonetheless, most analysts maintain data reliability in the short run
- Overall stance:
- The raw numbers are considered reliable for now, but it remains prudent to interpret with an understanding of revisions, seasonal patterns, and data collection realities
SPECIFIC DATA SOURCES
- The session introduces major data sources used in population and demographic analysis:
- American Community Survey (ACS)
- IPUMS (Integrated Public Use Microdata Series)
- PRB (Population Reference Bureau) World Population Data Sheet
- Hometowns dataset (demographic/socioeconomic profile for SOC 3223 students)
- Nigeria/Nigeria-analog demographic data discussion (context for data gaps)
- ACS background and evolution:
- The U.S. Census Bureau redesigned the long-form questionnaire (transition from decennial long form to ACS)
- Historically, decennial census used a short form plus a long form; long form was replaced by ACS
- 1% Long-Form-equivalent sample (pre-ACS long form)
- Moving picture rather than a single decennial snapshot:
- Since 2005, ACS provides a moving picture of the U.S. population every year
- Geography and units of analysis:
- Geography blocks include: blocks, block groups, census tracts, places (cities), counties, states
- Period estimates:
- 1-year estimates for nation, states, and large geographies (populations of 65,000+)
- 5-year estimates for smaller geographies (< 65,000)
Navigating Through ACS Data
- Access point and tools:
- Data source: data.census.gov
- The system displays official U.S. Census Bureau data with map/table interfaces
- Example: Mercedes city, Texas data profile (illustrative of a city-level ACS profile)
- Shows how to read ACS tables (e.g., P1, DP03, S1501, B01001, etc.) and geographic profiles
- On-screen example content (from slides):
- Includes narrative summaries and table/table-entry examples for a specific city (Mercedes, TX) including population, employment, income, education, health, etc.
- Key ACS table ideas mentioned:
- B01001: Sex by age
- DP02: Demographic characteristics (housing, families, etc.)
- DP03: Selected economic characteristics
- S1501: Educational attainment
- Income/poverty and housing tables (B25002, etc.)
- Data navigation notes:
- The ACS 5-year estimates are frequently used to analyze smaller geographies; 1-year estimates are used for larger geographies
- The data density and accuracy depend on geographic level and sample size
ACS Data Tables and Hometowns Data (Mercury city example and beyond)
- Mercedes city, Texas (illustrative ACS 5-year estimates):
- Population: e.g., total around 16,449 (ACS detail), with breakdowns across sex, age, race/ethnicity, education, income, and housing
- Examples of variables (from slide excerpts):
- B01001: Sex by Age (breakdowns by race categories such as White, Black, Asian, etc.)
- B01002: Median Age by Sex
- B19113: Median Family Income in the past 12 months (in 2023 inflation-adjusted dollars)
- S1501: Educational attainment (bachelors or higher)
- DP03: Economic characteristics (employment, industry, etc.)
- Geography snapshot components: Nation, State, County, Place, ZIP Code Tabulation Areas (ZCTAs), etc.
- Hometowns dataset (SOC 3223 student hometowns, 2019–2023 response set):
- Hometowns listed include: Alice, TX; Alief, TX; Austin, TX; Baytown, TX; Boerne, TX; Bulverde, TX; Canyon Lake, TX; Chicago, IL; Corpus Christi, TX; Daphne, AL; Donna, TX; Eagle Pass, TX; El Paso, TX; Harlingen, TX; Houston, TX; Katy, TX; Kerrville, TX; La Palma, CA; Lake Jackson, TX; Laredo, TX; McAllen, TX; Piedras Negras, Coahuila, Mexico; San Antonio, TX; Sugar Land, TX; Yanbu, Saudi Arabia
- Each row contains multiple columns of demographic and socioeconomic indicators, including:
- Population and population shares by race/ethnicity (e.g., pctafram, pctaian, pctasn, pctlat, pctmltrace, pctsor, etc.)
- Nativity and place-of-birth indicators (pborninst)
- Education attainment (pctbach+, f p Bach+, m p bach+)
- Median age (mdnage, fmdnage, mmdnage)
- Sex ratio (sexratio)
- Ownership and housing characteristics (pownhome, mdvalhome)
- Health insurance coverage (p<19noins, p1964noins, p65+noins)
- Family/poverty measures (mdfaminc, pfampov, fp18+pov, mp18+pov, fpampov)
- Some caveats in the dataset:
- Data for Alief, Texas unavailable (don’t know the reason)
- Some data available for 2023 and drawn from multiple sources (e.g., Point2 Homes, Statistical Atlas)
- Piedras Negras (Mexico) and Yanbu (Saudi Arabia) datasets are limited and approximate (2020/2022 data, with external sources cited)
- Definitions (examples from the slides):
- pop = Population
- pctafram = Pct. of population African American
- pborninst = Pct. of population born in state of residence
- pimmnat = Pct. of immigrants who have naturalized citizenship
- mdnage = Median age; fmdnage = Female median age; mmdnage = Male median age
- B01001 = Sex by Age; B01002 = Median Age by Sex; B19113 = Median Family Income
- p<19noins = Pct. of persons 0-18 years without insurance; p1964noins = 19-64 years without insurance; p65+noins = 65+ without insurance
- Data interpretation cautions:
- Some towns have missing data or limited data availability; not all variables exist for all geographies
- The Hometowns table combines ACS data with alternate sources for non-U.S. locations (e.g., Piedras Negras, Yanbu)
Population Reference Bureau (PRB) — World Population Data Sheet (2024)
- Contents overview:
- Special Focus sections by region: Africa; Americas; Asia; Europe; Oceania
- World Population per 10,000 population for Nursing and Midwifery Personnel (the metric shown as a regional distribution figure)
- GLOBAL TOTAL FERTILITY RATE; Infant mortality; Life expectancy at birth; Urban population percentage; GNI per capita (ppp)
- Notes & Sources; Definitions; Data tables across regions and countries
- Key highlights from the 2024 sheet:
- Global message: investments in Primary Health Care (PHC) can significantly improve health outcomes
- Approximately 50% of the world’s population lacks access to good PHC
- PHC as a platform for integrated health service delivery across the life cycle (pregnancy care, childhood immunizations, care for noncommunicable diseases, etc.)
- Scaling up quality PHC with workforce resources could prevent up to 60 imes 10^6 deaths by 2030, potentially increasing global life expectancy by 3.7 years
- Shortages of skilled health professionals contribute to overworked staff and reduced quality of care
- Health workforce data (illustrative figure):
- Number of Nursing and Midwifery Personnel per 10,000 Population, with categories:
- <15
- 15–35
- 35.1–60
- 60.1–120
- 120.1–525
- Data not available
- Note: Data reflect most recent available years between 2018 and 2022
- Regional and country data structure (examples):
- World totals and regional groupings; country-level values include births, deaths, fertility, life expectancy, urbanization, GNI per capita, health care coverage, and PHC-related metrics
- Regions include: NORTHERN AFRICA; AFRICA; NORTHERN AMERICA; AMERICAS; LATIN AMERICA AND THE CARIBBEAN; CENTRAL AMERICA; WESTERN ASIA; ASIA; OCEANIA; EUROPE; NORTHERN EUROPE; etc.
- Notes on data presentation:
- The sheet presents many variables with definitions and units (e.g., births per 1,000; deaths per 1,000; life expectancy in years; urban population percentage; GNI per capita, PPP)
- Data sources and notes accompany each region and country; some values are labeled as estimates with confidence intervals (not shown in detail in slides)
- United States and Canada (sample takeaway from the North American section):
- United States population mid-2024 roughly 336.6 million
- Births around 11 per 1,000; Deaths around 9 per 1,000
- Life expectancy at birth in the upper 70s to around 80 years (approximate trend in the sheet)
- GNI per capita (PPP) around 60{,}700 (US figure in the sheet)
- Oceania snapshot (Australia, New Zealand, etc.):
- Australia population around 27.3 million; life expectancy in the low 80s
- Health workforce and PHC indicators vary by country
- General interpretive notes:
- The PRB data sheet is a global reference tool for comparing population health, demographic indicators, and PHC strength across regions and over time
- It emphasizes policy relevance: investments in primary health care and health workforce expansion can yield large longevity and poverty-reduction benefits
Primary Health Care and Health Workforce (PRB, World Population Data Sheet 2024)
- Central claim: Access to and quality of Primary Health Care (PHC) are foundational to improved population health outcomes
- Core statements:
- About 50 ext{ extbackslash%} of the world’s population lacks access to good PHC
- PHC is a platform for comprehensive, continuous care across the life cycle (pregnancy care, immunizations, noncommunicable diseases, etc.)
- Scaling up PHC in low- and middle-income countries could prevent as many as 60 imes 10^6 deaths by 2030, increasing average life expectancy by 3.7 years
- Resource constraints include shortages of trained health professionals, leading to overworked staff and service quality concerns
- Health workforce data (per 10,000 population):
- Trends show varying levels of nursing/midwifery personnel across regions; data are the most recent available (2018–2022)
- Regional highlights (examples):
- Africa; Northern Africa; Americas; Europe; Asia; Oceania – each with country-level and regional aggregates for PHC-related indicators
- Implication for students and researchers:
- Use PRB data to assess health-system strength, PHC capacity, and potential policy impacts on population health and equity
Integrated Public Use Microdata Series (IPUMS)
- What IPUMS is:
- Integrated Public Use Microdata Series (IPUMS) provides harmonized census and survey microdata from many countries to enable cross-time and cross-country analysis
- Major IPUMS data families:
- IPUMS USA: U.S. Census and American Community Survey microdata from 1850 to present
- IPUMS CPS: Current Population Survey microdata including basic monthly surveys and supplements from 1962 to present
- IPUMS INTERNATIONAL: World’s largest collection of census microdata covering 100+ countries (contemporary and historical)
- IPUMS GLOBAL HEALTH: Health survey data from DHS, MICS, and PMA surveys (harmonized)
- IPUMS NHGIS: U.S. Census summary tables and GIS data from 1790 to present
- IPUMS IHGIS: Summary tables and GIS data from population, housing, and agricultural censuses around the world
- IPUMS TIME USE: Time-use data from 1930 to present
- IPUMS HEALTH SURVEYS: U.S. health survey data (NHIS since 1963; MEPS since 1996)
- Access and support:
- IPUMS site emphasizes free access to population data and tools to analyze change over time and across geographies
- Includes data are updated regularly and documented (with user guides, sample descriptions, question inventories, etc.)
- Practical uses:
- Merge census microdata across decades, study demographic change, compare regions, conduct multivariate analyses, and build custom extracts
- Motto and reminder:
- USE IT FOR GOOD — NEVER FOR EVIL
- IPUMS USA/SDA interface basics:
- The SDA (Statistical Data Analysis) environment allows row, column, filter, and control specifications to build tables and analyses
- Weights: perwt (person weight) is used to ensure representativeness in analyses
- Steps to build a table in SDA:
- Select a dataset (e.g., ACS 2001–2023 or 1850–2023 for US data)
- Choose a row variable (e.g., year, race, education, etc.)
- Choose a column variable (e.g., sex, age group, etc.)
- Add controls/filters (e.g., geography, year, region)
- Choose output mode (append vs. replace) and data display options
- Output options:
- Frequencies, cross-tabulations, means, correlations, regressions, etc.
- Ability to download CSV, or view results interactively
- Important guidance:
- For multi-year ACS data, include a year variable in analyses to avoid mixing years
- The 2020 1-year ACS PUMS file has special considerations; avoid direct comparisons with multi-year samples without guidance
- COVID-19 and related data collection changes affect interpretability of some ACS periods; consult guidance when necessary
- Resources:
- IPUMS online data analysis system (SDA) tutorials and help sections
- Video tutorials and user forums available on the IPUMS site
Practical notes on IPUMS data and examples
- Examples of data types accessible via IPUMS:
- Decennial census microdata (1790–2010 annually; cross-year harmonization)
- ACS microdata (2000–present; single-year and multi-year samples)
- CPS microdata for monthly employment and related indicators
- NHGIS and IHGIS provide aggregated tables and GIS data for geographies and time periods
- Data integration and harmonization:
- IPUMS harmonizes variable names and codes across decades/countries to enable cross-time comparisons
- Documentation and user guides explain variable codes and harmonization schemes
- Access considerations:
- Some features require a free IPUMS account (register to download data or use Abacus for online analysis)
- Abacus (IPUMS Abacus) allows building custom datasets for download
- Ethical guidance:
- IPUMS emphasizes responsible use of microdata and privacy-respecting practices
Recap and Next Steps
- Recap of required readings and data sources:
- PRB article on census accuracy and measurement; Abbott-related Texas seat piece
- NYT business/economics piece on job data reliability and BLS governance context
- ACS as a primary data source for U.S. demographic characteristics (with detailed city profiles and hometown-level data in the course materials)
- IPUMS as a key suite of harmonized microdata resources for cross-time analysis
- PRB World Population Data Sheet (2024) as a global health/population reference, including PHC focus and regional country data
- In the next lecture, we will move to the topic of “Understanding Population Growth” and continue to build a framework for measuring and interpreting demographic change, data quality, and measurement challenges
Key Concepts and Takeaways (synthesis)
- Demographic data come from multiple sources: censuses, registration systems, vital statistics, surveys, and population registers
- ACS provides a moving, yearly picture of the U.S. population, replacing the one-shot long-form census in 2005 and beyond
- The 1-year vs. 5-year ACS estimates balance timeliness and precision by geography
- IPUMS harmonizes and provides easy access to vast microdata across countries and time for comparative demographic analysis
- PRB’s World Population Data Sheet highlights global health context, PHC investment benefits, and regional/country-level indicators
- The reliability of official data (like the U.S. jobs data) can be high, but political context, revisions, and data-collection dynamics require cautious interpretation
- Hometowns data in this course illustrate how ACS data can be used to profile local demography and socioeconomic status, while also noting data gaps and reliability issues for certain locations
- Ethical and practical considerations matter when using demographic data for policy, forecasting, or public communication
Next Lecture Preview
- Focus: Understanding Population Growth (measurement, drivers, and interpretation of growth rates), including methodological notes on data quality, measurement error, and how growth relates to policy and social outcomes
Quiz Reminder
- Quiz 1 is today. Be prepared to apply concepts from ACS, data reliability, and interpretation of demographic indicators to short-answer and data-interpretation questions
References and URLs (for quick access)
- PRB: How will we measure the accuracy of the 2020 census?
- URL: https://www.prb.org/resources/how-will-wemeasure-the- accuracy-of-the-2020-census/
- Express News: How Abbott cost Texas a House seat
- URL: https://www.expressnews.com/opinion/commentary/article/Commentary-How-Abbott-cost-Texas-a-Houseseat-16159960.php
- NYT article: Is the Jobs Data Still Reliable? Yes, for Now
- URL: https://www.nytimes.com/2025/09/05/business/jobs-data-reliability.html?smid=url-share
- ACS data access and Mercedes, TX example
- Data access: https://data.census.gov
- IPUMS data portal