IDIS100 Week 02 Notes — Idiographic vs Nomothetic; Causality; Unit of Analysis; Time; Two Logical Systems

Research Purposes: Exploration, Description, Explanation

  • Exploration: develop initial understanding of a phenomenon, break new ground, gain new insights, pave the way for future research

  • Example: focus groups or in-depth interviews before designing and running a national survey

  • Description: provide precise measurement and description of population or phenomenon

  • Examples: Official statistics (e.g., median household income), political polls, descriptive statistics of a sample in journal articles

  • Explanation: answer the question of why; what causes some outcome or phenomenon

  • Two kinds of explanations:

    • Idiographic explanation

    • Nomothetic explanation

  • Relationship among purposes: not mutually exclusive; a study may have multiple purposes

  • For what purpose? Exploration, Description, Explanation (and their connections)

  • Time, causality, unit of analysis, and logic will interact with these purposes

Idiographic vs Nomothetic

  • Nomothetic causality aims at general laws or patterns across cases

  • Idiographic causality focuses on a comprehensive explanation of a single case or a few cases

  • Unit of Analysis, Time, and Logical Systems align with the two approaches

  • Recap: Idiographic vs Nomothetic are two logical approaches to understanding phenomena and they can be complementary in mixed designs

Causality (Nomothetic) and Criteria

  • Three Criteria for Nomothetic Causality:

    • Correlation

    • Time Order

    • Nonspuriousness

  • Definition of correlation: IV (independent variable) and DV (dependent variable) must be related

  • Time order: IV should precede DV in time

  • Nonspuriousness: the IV-DV relationship is not due to a third variable (confounder)

  • Spurious relationships: correlation does not imply causation; third variables may explain the association

  • Confounder: a variable that is a common cause of both IV and DV; accounting for confounders can alter or eliminate the IV-DV relationship

  • If, after controlling for a confounder, the IV-DV relationship disappears, the relationship is spurious; if it remains, it may be nonspurious or partially confounded

Correlation: Examples

  • Positive correlation example: income increases with education

    • Represented as
      ho_{IV,DV} > 0 where IV = education, DV = income

  • Negative correlation example: health deteriorates with age

    • Represented as
      ho_{IV,DV} < 0 where IV = age, DV = health outcomes

  • Clinical trial illustration (not causal by itself):

    • In a randomized medicine study with a placebo, distribution of treatment by gender may appear related, highlighting the need to separate correlation from causal interpretation

Time Order and Longitudinal Strategies

  • Time order criterion: IV must occur before DV in time

  • Example: education → income (time order satisfied if education precedes income)

  • Example of reverse causality: marriage and happiness

    • Does marriage cause happiness, or are happier people more likely to marry?

  • Solution: use longitudinal data to establish time order

  • Longitudinal design examples in slides:

    • IV = marital status at time 1, DV = happiness at time 2

    • Alternatively: IV = happiness at time 1, DV = marital status at time 2

Nonspuriousness and Confounding (Detailed)

  • Nonspuriousness: a genuine association between IV and DV not explained by other variables

  • Confounder: a third variable that is related to both IV and DV

  • How confounders arise: e.g., age can influence both smoking and mortality risk, creating a spurious association if not controlled

  • Common ways to address confounding:

    • Experiments (randomization) [Week 4]

    • Regressions: controlling for confounders in statistical models

    • Stratification: analysing subgroups where the confounder is held roughly constant

Confounders: Exercise and Interpretation

  • Example exercise setup:

    • Variables: own education (IV), own income (DV), parental education (confounder)

    • Question: Which is IV? which is DV? which is confounder?

  • Illustration: If parental education is not accounted for, observed association between own education and income may be confounded

  • After accounting for parental education:

    • The association between own education and income may weaken but still exist

    • The relationship becomes confounded but not entirely spurious

  • In regression analyses or other controls, confounding bias is removed for the controlled variable; if no unaccounted confounders remain, the association is nonspurious

Example: Smoking and Mortality (Multiple Age Groups)

  • Table: Risk of death in a 20-year period among women in Whickham, England, by smoking status at start (1972-74)

    • Vital status: Dead / Alive / Total

    • Smoker: Dead 139, Alive 443, Total 582; Non-smoker: Dead 230, Alive 502, Total 732

    • Risk (dead/total): Smokers = rac139582=0.239ext(0.24)rac{139}{582} = 0.239 ext{ (≈0.24)}, Non-smokers = rac230732=0.314ext(0.31)rac{230}{732} = 0.314 ext{ (≈0.31)}

  • Note: These risks illustrate how smoking is associated with higher mortality in some cohorts; interpretation depends on context and potential confounders

  • Table: Ages 18-44 (Smoker vs Non-smoker)

    • Dead: 15 vs 12; Alive: 270 vs 327; Total: 285 vs 339

    • Risk: Smokers = rac15285ext0.052rac{15}{285} ext{ ≈ } 0.052, Non-smokers = rac12339ext0.035rac{12}{339} ext{ ≈ } 0.035

    • Interpretation: relatively small absolute differences in this age band; other factors may contribute

  • Table: Ages 45-64

    • Dead: 80 vs 53; Alive: 167 vs 147; Total: 247 vs 200

    • Risk: Smokers ≈ rac80247ext0.324rac{80}{247} ext{ ≈ } 0.324, Non-smokers ≈ rac53200ext=0.265rac{53}{200} ext{ = } 0.265

  • Table: Ages 65+

    • Dead: 44 vs 165; Alive: 6 vs 28; Total: 50 vs 193

    • Risk: Smokers = rac4450=0.88rac{44}{50} = 0.88, Non-smokers = rac1651930.854rac{165}{193} ≈ 0.854

  • Age as a potential confounder can be seen in multi-way tables

  • Example: Mortality risk by smoking status and age category (age as a confounder)

    • For each age group, compare risk for smokers vs non-smokers within that same age band

    • After stratifying by age, the confounding effect of age on the smoking-mortality association is mitigated

Age as a Confounder: Stratification by Age

  • Table layout shows mortality risk within each age group by smoking status

  • Key takeaway: stratifying by age can block the confounding path Age → Smoking and Age → Mortality

  • How stratification helps:

    • Within each age group, smoking status is compared at the same age, removing age as a confounder

    • Pr(Death|Smoker, Age group) vs. Pr(Death|Non-Smokers, Age group)

What Is a Confounder? Clarifications

  • A confounder is related to both IV and DV

  • If a variable is only related to IV or only to DV, it is not a confounder in causal inference

  • Notation example: If C is related to both IV and DV, but C1 and C2 are not, then only C is a confounder

  • Practical implication: identify potential confounders through prior research and theory

Dealing with Confounders: How to Identify and Address

  • How to know potential confounders:

    • Look to prior research and theory

  • Methods to address confounding:

    • Experiments (randomization)

    • Regression controls: “Controlling for …”

    • Stratification by confounders (e.g., stratify by age)

    • Subgroup analyses where confounding is minimized

Why Stratify by Age? A Worked Example

  • Pr(Death|Smoker, Age+) vs. Pr(Death|Non-Smokers, Age+)

  • Pr(Death|Smoker, Age 45-64) vs. Pr(Death|Non-Smokers, Age 45-64)

  • Pr(Death|Smoker, Age 18-44) vs. Pr(Death|Non-Smokers, Age 18-44)

  • By comparing within the same age groups, age is held constant and cannot confound the smoking-mortality relationship

False Criteria for Nomothetic Causality

  • “A nomothetic explanation is probabilistic and usually incomplete.” (true concept)

  • False criteria include:

    • Complete Causation: X is a cause of Y but not the only cause

    • Example: Education is one of several causes of income

    • No Exceptional Cases: Exceptions do not disprove X causes Y

    • Example: Some highly educated people may have low income

    • Apply in the Majority of Cases: True even if it appears in a minority

    • Example: Smoking is a major cause of lung cancer, though only a minority of smokers develop the disease

What is the Unit of Analysis? (UoA)

  • Definition: Who/what are you studying?

  • Examples: Individuals (students, men, women, children)

  • Groups: families, households, couples

  • Organizations: corporations, universities

  • Social interactions: text messages

  • Social Artifacts: books, paintings, music

  • Geographical areas: cities, countries

  • Multi-level UoA: e.g., students and school; children and families

  • Visual: imagine a spreadsheet where each row is an observation (the UoA)

Unit of Analysis: Concrete Examples

  • Example: In a pre-course survey, the unit of analysis is a student

    • Data example: StudentName, major, year

    • Proportion of students by year: 2? 87%, 3+? 13%

  • Exercise: For three examples, discuss what the unit of analysis is and how to tell

What Is the Unit of Analysis? (Revisited with Data Tables)

  • Revisit risk of death tables (Smoker vs Non-smoker across age groups) to identify UoA in each table

  • Always align the UoA with the research question and data collection method

  • Be mindful of inconsistencies: from whom data are gathered vs. from whom conclusions are drawn

Ecological Fallacy and Reductionist Fallacy

  • Ecological Fallacy (group-level to individual-level):

    • Example 1: A city has high crime; inferring Joan (an individual in New York) stole a watch

    • Example 2: Regions with higher average income have better health outcomes; inferring an individual with high income is healthier

  • Reductionist Fallacy (group-level conclusions about individuals):

    • Example 1: Individuals with low SES are more likely to divorce; conclude that more developed countries have lower divorce rates

    • Example 2: If Wisconsin is liberal, conclude Wisconsin as a whole is liberal

  • Reductionism: reducing complex social phenomena to too few explanations (economic, psychological, etc.)

  • Exercise: Ecological Fallacy or Reductionist Fallacy? (wooclap activity)

For How Long? Time Dimensions in Research

  • Cross-Sectional vs Longitudinal studies

    • Cross-Sectional: data collected at one point in time

    • Longitudinal: data collected at multiple points in time

  • Some definitions of longitudinal include panel studies; others include trend and cohort studies as longitudinal under broader definitions

  • Key distinction: whether data are collected on the same individuals over time or not

Cross-Sectional vs Trend Studies (Repeated Cross-Sectional)

  • Cross-Sectional Study: collect data from a population one time

  • Trend Study: collect data from a population multiple times, not necessarily the same respondents

  • Repeated Cross-Sections: same population or same sampling frame across time (e.g., Census years 2000, 2010, 2020)

  • Each census is a cross-sectional study; comparing censuses is a trend study

  • Example: European Value Study (EVS) and World Value Survey (WVS): cross-national, repeated cross-sectional longitudinal program

Cohort Studies vs Panel Studies

  • Cohort study: collect data from the same cohort over time; may draw different samples from the same cohort at different waves

    • Birth cohorts: e.g., people born in 1997–2012 (Gen Z)

    • Marriage cohorts: e.g., people married in 1997–2012

  • Panel study: data from the same sample across waves; may follow one or multiple cohorts

    • Example: Wisconsin Longitudinal Study (WLS) – follows graduates from Wisconsin high schools in 1957 across years

    • Health and Retirement Study (HRS) – follows multiple cohorts across time

Exercise: Studying Election (Matching Design to Question)

  • Task: For each question, identify the best match: cross-sectional, trend, cohort, or panel

  • Example questions (as given in slides):

    • 1) What is the party a person would vote for if the election were held today? (Cross-sectional snapshot)

    • 2) How has each party’s chance of winning changed over time? (Trend over time)

    • 3) Which party did Baby Boomers traditionally support, and which party are they more likely to vote for now? (Cohort analysis by birth cohort)

    • 4) For those who watched the campaign events, did their views change afterward? (Panel/longitudinal follow-up)

Two Logical Systems: Theory Testing vs Building

  • Distinction between building theory (inductive) and testing theory (deductive)

  • Deduction: From Theory to Hypothesis to Observations (Quantitative emphasis)

    • Hypothesis: a clear statement of expected relationships between two or more variables

    • Common in quantitative research

  • Induction: From Observations to Theory (Qualitative emphasis)

    • Field research to develop theories through observations

    • In some quantitative methods, techniques like Latent Class Analysis are inductive

  • Wheel of Science (integrates deduction and induction in research practice)

Deduction: From Theory to Hypothesis

  • Process: Theory → Hypothesis → Observations

  • Hypothesis: explicit, testable statement about relationships between variables

  • Emphasis: testing theoretical predictions with data

Induction: From Observations to Theory

  • Process: Observe patterns in social life → identify regularities → develop generalized explanations

  • Emphasis: theory-building from empirical observation

  • Latent Class Analysis cited as an inductive method in quantitative work

Wheel of Science

  • Concept that research often involves a cycle of theory and data collection, with both deductive and inductive steps

Today’s Summary and Takeaways

  • Research may have multiple purposes: exploration, description, explanation

  • Different approaches to causality: nomothetic (general laws) vs idiographic (case-focused explanations)

  • Different units of analysis: individuals, groups, organizations, social artifacts, geographical areas, etc.

  • Time dimensions: cross-sectional vs longitudinal (including trend, cohort, and panel variants)

  • Two logical systems: induction and deduction (and their use in theory-building and theory-testing)

Next Week Preview

  • Conceptualization and Operationalization (Chapter 5)

  • Prepare to define and operationalize key concepts for measurement in research

Equations, Concepts, and Notation References (LaTeX)

  • Correlation concept:

    • <br>hoIV,DV<br>eq0extindicatesacorrelationbetweenIVextandDV.<br>ho_{IV,DV} <br>eq 0 ext{ indicates a correlation between } IV ext{ and } DV.

  • Time order (temporal precedence):

    • If a variable $IV$ occurs before $DV$ in time, we have $t{IV} < t{DV}$.

  • Risk calculation (example from tables):

    • For smoking vs. mortality in Whickham 20-year table:

    • extRisk<em>20extyr(extSmoker)=rac139582oughly0.239 extRisk</em>20extyr(extNonsmoker)=rac230732<br>oughly0.314.ext{Risk}<em>{20 ext{yr}}( ext{Smoker}) = rac{139}{582} oughly 0.239 \ ext{Risk}</em>{20 ext{yr}}( ext{Non-smoker}) = rac{230}{732} <br>oughly 0.314.

  • Age-group risks (example 18-44):

    • extRisk<em>1844(extSmoker)=rac15285oughly0.0526, extRisk</em>1844(extNonsmoker)=rac12339<br>oughly0.0354.ext{Risk}<em>{18-44}( ext{Smoker}) = rac{15}{285} oughly 0.0526, \ ext{Risk}</em>{18-44}( ext{Non-smoker}) = rac{12}{339} <br>oughly 0.0354.

  • Age-stratified risks (65+):

    • extRisk<em>65+(extSmoker)=rac4450=0.88, extRisk</em>65+(extNonsmoker)=rac165193<br>oughly0.854.ext{Risk}<em>{65+}( ext{Smoker}) = rac{44}{50} = 0.88, \ ext{Risk}</em>{65+}( ext{Non-smoker}) = rac{165}{193} <br>oughly 0.854.

  • False criteria for nomothetic causality (conceptual): these are not accepted criteria, as nomothetic explanations are probabilistic and usually incomplete

"} }} {