IDIS100 Week 02 Notes — Idiographic vs Nomothetic; Causality; Unit of Analysis; Time; Two Logical Systems

Research Purposes: Exploration, Description, Explanation

Exploration: develop initial understanding of a phenomenon, break new ground, gain new insights, pave the way for future research
Example: focus groups or in-depth interviews before designing and running a national survey
Description: provide precise measurement and description of population or phenomenon
Examples: Official statistics (e.g., median household income), political polls, descriptive statistics of a sample in journal articles
Explanation: answer the question of why; what causes some outcome or phenomenon
Two kinds of explanations:
- Idiographic explanation
- Nomothetic explanation
Relationship among purposes: not mutually exclusive; a study may have multiple purposes
For what purpose? Exploration, Description, Explanation (and their connections)
Time, causality, unit of analysis, and logic will interact with these purposes

Idiographic vs Nomothetic

Nomothetic causality aims at general laws or patterns across cases
Idiographic causality focuses on a comprehensive explanation of a single case or a few cases
Unit of Analysis, Time, and Logical Systems align with the two approaches
Recap: Idiographic vs Nomothetic are two logical approaches to understanding phenomena and they can be complementary in mixed designs

Causality (Nomothetic) and Criteria

Three Criteria for Nomothetic Causality:
- Correlation
- Time Order
- Nonspuriousness
Definition of correlation: IV (independent variable) and DV (dependent variable) must be related
Time order: IV should precede DV in time
Nonspuriousness: the IV-DV relationship is not due to a third variable (confounder)
Spurious relationships: correlation does not imply causation; third variables may explain the association
Confounder: a variable that is a common cause of both IV and DV; accounting for confounders can alter or eliminate the IV-DV relationship
If, after controlling for a confounder, the IV-DV relationship disappears, the relationship is spurious; if it remains, it may be nonspurious or partially confounded

Correlation: Examples

Positive correlation example: income increases with education
- Represented as
  ho_{IV,DV} > 0 where IV = education, DV = income
Negative correlation example: health deteriorates with age
- Represented as
 ho_{IV,DV} < 0 where IV = age, DV = health outcomes
Clinical trial illustration (not causal by itself):
- In a randomized medicine study with a placebo, distribution of treatment by gender may appear related, highlighting the need to separate correlation from causal interpretation

Time Order and Longitudinal Strategies

Time order criterion: IV must occur before DV in time
Example: education → income (time order satisfied if education precedes income)
Example of reverse causality: marriage and happiness
- Does marriage cause happiness, or are happier people more likely to marry?
Solution: use longitudinal data to establish time order
Longitudinal design examples in slides:
- IV = marital status at time 1, DV = happiness at time 2
- Alternatively: IV = happiness at time 1, DV = marital status at time 2

Nonspuriousness and Confounding (Detailed)

Nonspuriousness: a genuine association between IV and DV not explained by other variables
Confounder: a third variable that is related to both IV and DV
How confounders arise: e.g., age can influence both smoking and mortality risk, creating a spurious association if not controlled
Common ways to address confounding:
- Experiments (randomization) [Week 4]
- Regressions: controlling for confounders in statistical models
- Stratification: analysing subgroups where the confounder is held roughly constant

Confounders: Exercise and Interpretation

Example exercise setup:
- Variables: own education (IV), own income (DV), parental education (confounder)
- Question: Which is IV? which is DV? which is confounder?
Illustration: If parental education is not accounted for, observed association between own education and income may be confounded
After accounting for parental education:
- The association between own education and income may weaken but still exist
- The relationship becomes confounded but not entirely spurious
In regression analyses or other controls, confounding bias is removed for the controlled variable; if no unaccounted confounders remain, the association is nonspurious

Example: Smoking and Mortality (Multiple Age Groups)

Table: Risk of death in a 20-year period among women in Whickham, England, by smoking status at start (1972-74)
- Vital status: Dead / Alive / Total
- Smoker: Dead 139, Alive 443, Total 582; Non-smoker: Dead 230, Alive 502, Total 732
- Risk (dead/total): Smokers = $rac{139}{582} = 0.239 ext{ (≈0.24)}$ , Non-smokers = $rac{230}{732} = 0.314 ext{ (≈0.31)}$
Note: These risks illustrate how smoking is associated with higher mortality in some cohorts; interpretation depends on context and potential confounders
Table: Ages 18-44 (Smoker vs Non-smoker)
- Dead: 15 vs 12; Alive: 270 vs 327; Total: 285 vs 339
- Risk: Smokers = $rac{15}{285} ext{ ≈ } 0.052$ , Non-smokers = $rac{12}{339} ext{ ≈ } 0.035$
- Interpretation: relatively small absolute differences in this age band; other factors may contribute
Table: Ages 45-64
- Dead: 80 vs 53; Alive: 167 vs 147; Total: 247 vs 200
- Risk: Smokers ≈ $rac{80}{247} ext{ ≈ } 0.324$ , Non-smokers ≈ $rac{53}{200} ext{ = } 0.265$
Table: Ages 65+
- Dead: 44 vs 165; Alive: 6 vs 28; Total: 50 vs 193
- Risk: Smokers = $rac{44}{50} = 0.88$ , Non-smokers = $rac{165}{193} ≈ 0.854$
Age as a potential confounder can be seen in multi-way tables
Example: Mortality risk by smoking status and age category (age as a confounder)
- For each age group, compare risk for smokers vs non-smokers within that same age band
- After stratifying by age, the confounding effect of age on the smoking-mortality association is mitigated

Age as a Confounder: Stratification by Age

Table layout shows mortality risk within each age group by smoking status
Key takeaway: stratifying by age can block the confounding path Age → Smoking and Age → Mortality
How stratification helps:
- Within each age group, smoking status is compared at the same age, removing age as a confounder
- Pr(Death|Smoker, Age group) vs. Pr(Death|Non-Smokers, Age group)

What Is a Confounder? Clarifications

A confounder is related to both IV and DV
If a variable is only related to IV or only to DV, it is not a confounder in causal inference
Notation example: If C is related to both IV and DV, but C1 and C2 are not, then only C is a confounder
Practical implication: identify potential confounders through prior research and theory

Dealing with Confounders: How to Identify and Address

How to know potential confounders:
- Look to prior research and theory
Methods to address confounding:
- Experiments (randomization)
- Regression controls: “Controlling for …”
- Stratification by confounders (e.g., stratify by age)
- Subgroup analyses where confounding is minimized

Why Stratify by Age? A Worked Example

Pr(Death|Smoker, Age+) vs. Pr(Death|Non-Smokers, Age+)
Pr(Death|Smoker, Age 45-64) vs. Pr(Death|Non-Smokers, Age 45-64)
Pr(Death|Smoker, Age 18-44) vs. Pr(Death|Non-Smokers, Age 18-44)
By comparing within the same age groups, age is held constant and cannot confound the smoking-mortality relationship

False Criteria for Nomothetic Causality

“A nomothetic explanation is probabilistic and usually incomplete.” (true concept)
False criteria include:
- Complete Causation: X is a cause of Y but not the only cause
- Example: Education is one of several causes of income
- No Exceptional Cases: Exceptions do not disprove X causes Y
- Example: Some highly educated people may have low income
- Apply in the Majority of Cases: True even if it appears in a minority
- Example: Smoking is a major cause of lung cancer, though only a minority of smokers develop the disease

What is the Unit of Analysis? (UoA)

Definition: Who/what are you studying?
Examples: Individuals (students, men, women, children)
Groups: families, households, couples
Organizations: corporations, universities
Social interactions: text messages
Social Artifacts: books, paintings, music
Geographical areas: cities, countries
Multi-level UoA: e.g., students and school; children and families
Visual: imagine a spreadsheet where each row is an observation (the UoA)

Unit of Analysis: Concrete Examples

Example: In a pre-course survey, the unit of analysis is a student
- Data example: StudentName, major, year
- Proportion of students by year: 2? 87%, 3+? 13%
Exercise: For three examples, discuss what the unit of analysis is and how to tell

What Is the Unit of Analysis? (Revisited with Data Tables)

Revisit risk of death tables (Smoker vs Non-smoker across age groups) to identify UoA in each table
Always align the UoA with the research question and data collection method
Be mindful of inconsistencies: from whom data are gathered vs. from whom conclusions are drawn

Ecological Fallacy and Reductionist Fallacy

Ecological Fallacy (group-level to individual-level):
- Example 1: A city has high crime; inferring Joan (an individual in New York) stole a watch
- Example 2: Regions with higher average income have better health outcomes; inferring an individual with high income is healthier
Reductionist Fallacy (group-level conclusions about individuals):
- Example 1: Individuals with low SES are more likely to divorce; conclude that more developed countries have lower divorce rates
- Example 2: If Wisconsin is liberal, conclude Wisconsin as a whole is liberal
Reductionism: reducing complex social phenomena to too few explanations (economic, psychological, etc.)
Exercise: Ecological Fallacy or Reductionist Fallacy? (wooclap activity)

For How Long? Time Dimensions in Research

Cross-Sectional vs Longitudinal studies
- Cross-Sectional: data collected at one point in time
- Longitudinal: data collected at multiple points in time
Some definitions of longitudinal include panel studies; others include trend and cohort studies as longitudinal under broader definitions
Key distinction: whether data are collected on the same individuals over time or not

Cross-Sectional vs Trend Studies (Repeated Cross-Sectional)

Cross-Sectional Study: collect data from a population one time
Trend Study: collect data from a population multiple times, not necessarily the same respondents
Repeated Cross-Sections: same population or same sampling frame across time (e.g., Census years 2000, 2010, 2020)
Each census is a cross-sectional study; comparing censuses is a trend study
Example: European Value Study (EVS) and World Value Survey (WVS): cross-national, repeated cross-sectional longitudinal program

Cohort Studies vs Panel Studies

Cohort study: collect data from the same cohort over time; may draw different samples from the same cohort at different waves
- Birth cohorts: e.g., people born in 1997–2012 (Gen Z)
- Marriage cohorts: e.g., people married in 1997–2012
Panel study: data from the same sample across waves; may follow one or multiple cohorts
- Example: Wisconsin Longitudinal Study (WLS) – follows graduates from Wisconsin high schools in 1957 across years
- Health and Retirement Study (HRS) – follows multiple cohorts across time

Exercise: Studying Election (Matching Design to Question)

Task: For each question, identify the best match: cross-sectional, trend, cohort, or panel
Example questions (as given in slides):
- 1) What is the party a person would vote for if the election were held today? (Cross-sectional snapshot)
- 2) How has each party’s chance of winning changed over time? (Trend over time)
- 3) Which party did Baby Boomers traditionally support, and which party are they more likely to vote for now? (Cohort analysis by birth cohort)
- 4) For those who watched the campaign events, did their views change afterward? (Panel/longitudinal follow-up)

Two Logical Systems: Theory Testing vs Building

Distinction between building theory (inductive) and testing theory (deductive)
Deduction: From Theory to Hypothesis to Observations (Quantitative emphasis)
- Hypothesis: a clear statement of expected relationships between two or more variables
- Common in quantitative research
Induction: From Observations to Theory (Qualitative emphasis)
- Field research to develop theories through observations
- In some quantitative methods, techniques like Latent Class Analysis are inductive
Wheel of Science (integrates deduction and induction in research practice)

Deduction: From Theory to Hypothesis

Process: Theory → Hypothesis → Observations
Hypothesis: explicit, testable statement about relationships between variables
Emphasis: testing theoretical predictions with data

Induction: From Observations to Theory

Process: Observe patterns in social life → identify regularities → develop generalized explanations
Emphasis: theory-building from empirical observation
Latent Class Analysis cited as an inductive method in quantitative work

Wheel of Science

Concept that research often involves a cycle of theory and data collection, with both deductive and inductive steps

Today’s Summary and Takeaways

Research may have multiple purposes: exploration, description, explanation
Different approaches to causality: nomothetic (general laws) vs idiographic (case-focused explanations)
Different units of analysis: individuals, groups, organizations, social artifacts, geographical areas, etc.
Time dimensions: cross-sectional vs longitudinal (including trend, cohort, and panel variants)
Two logical systems: induction and deduction (and their use in theory-building and theory-testing)

Next Week Preview

Conceptualization and Operationalization (Chapter 5)
Prepare to define and operationalize key concepts for measurement in research

Equations, Concepts, and Notation References (LaTeX)

Correlation concept:
- $ ho_{IV,DV} eq 0 ext{ indicates a correlation between } IV ext{ and } DV.$
Time order (temporal precedence):
- If a variable $IV$ occurs before $DV$ in time, we have $t{IV} < t{DV}$.
Risk calculation (example from tables):
- For smoking vs. mortality in Whickham 20-year table:
- $ext{Risk}{20 ext{yr}}( ext{Smoker}) = rac{139}{582} oughly 0.239 \ ext{Risk}{20 ext{yr}}( ext{Non-smoker}) = rac{230}{732} oughly 0.314.$
Age-group risks (example 18-44):
- $ext{Risk}{18-44}( ext{Smoker}) = rac{15}{285} oughly 0.0526, \ ext{Risk}{18-44}( ext{Non-smoker}) = rac{12}{339} oughly 0.0354.$
Age-stratified risks (65+):
- $ext{Risk}{65+}( ext{Smoker}) = rac{44}{50} = 0.88, \ ext{Risk}{65+}( ext{Non-smoker}) = rac{165}{193} oughly 0.854.$
False criteria for nomothetic causality (conceptual): these are not accepted criteria, as nomothetic explanations are probabilistic and usually incomplete

"} }} {