Week 7: Causality
WEEK 6: Causality I 11/07
First half of the course
Using models to estimate relationship between x and y ⇒ is this relationship possible “casual”
How do we know if X causes Y? ⇒ are the estimates causal? BIG QUESTION: causal inference
Example: Do peace-keepers reduce conflict in countries emerging from civil war or are they ineffective?
“Why questions” typically causal
Causality: X causes Y if an intervention that changes the value of X produces a probabilistic change in Y
Intervention: X is being changed or altered
Probabilistic: Y should change on average but not in every instance
Example:
Aspirin causes a reduction in fever symptoms
Intervention: someone takes aspirin, in their food, etc
Outcome: Taking aspirin doesn’t work 100% of the time but in general…
The problem with causality
Not obvious: Positive correlation w/ education & earnings
Ppl alr wealthy ⇒ get more education ⇒ cycle repeats (self-select)
Not obvious: voter ID & turnout
Comparing diff states w diff laws therefore not comparable and relationship might not exist
Spurious correlation: correlation is common in nature, how do we observe correlation?
Can’t directly observe a change in X causing a change in Y, all we can see is correlations between X and Y
Therefore: simulate data to cement a correlation = causation
Using rnorm() which draws random numbers from a norm distribution
Example:
rnorm(n = 4, mean = -10, sd = 100)
N = number of draws, mean = average, sd = standard deviation
Faking (simulating) treatment
Example:
fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5))
fake_election
Making the outcome
Let’s say parties raise ~20k on average +/- 4k
Example:
fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5),
funding = rnorm(n = 500, mean = 20000, sd = 4000))
fake_election
For every % of vote you get, $20k more; causal effect of vote share on treatment variable (outcome)
Example:
fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5),
funding = rnorm(n = 500, mean = 20000, sd = 4000) + 2000 * party_share)
fake_election
Fundamental Problem w Causality
In theory: take a country, look at when democracy/not democracy ⇒ compare wars involved in & take average ⇒ see if demo = less war
Compare two possible scenarios, canada vs china vs usa ⇒ potential outcomes framework
We can only observe the countries in scenarios at a given time; and observe how many wars are involved in as a democracy but we have no idea for non democracy for USA b/c it has never not had a democracy…
Only observe potential outcomes for units we are interested in… we cannot observe non-unit values ⇒ missing data LOL
What would have happened = counterfactual ⇒ make a good guess as possible as to what Y (usa, china) wouldve been 0, 1
Comparing Apples and Oranges
Why not compare the number of wars for countries where democracy = 0/1; if democracies fight less then democracies are usually in less war
Implicit saying autocracies and democracies are good counterfactuals; USA and China if democracies and autocracies switched they would look the same ⇒ no
Why experiments work
Randomly expose participants to some treatment while others are exposed to nothing (placebo)
Can’t observe the same person seeing and not seeing the ad ⇒ since it was random, they are good counterfactuals of one another
War ⇒ casualties ⇒ support
War ⇒ casualties ⇒ support
Still blocking the entire effect of war on support? ⇒ Some of the effect but not all; some scenarios do that (maybe: act of declaring war effect on public support?) if adjusting for casualties ⇒ blocking the indirect path
Causal Diagram
Causal model how the data came to be (data generating process)
Model tells us how to ID a causal effect
Directed Acyclical Graphs (DAGs)
A modeling tool for thinking ab causality;
Nodes: points = variables
Edges; arrows = direction of causality
Identification
DAGs help figure out how to estimate the effect of one variable on another
Need to adjust/control for the other variables ⇒ process is “identification” to identify X on Y
Example: Waffle house causing divorce? NO but there is a correlation… what is driving the relationship?
Lurking variable which is other factors that matter
Lurking Variable
Example: The south! It has many waffle house locations & may effect divorces but waffles do not cause divorce!
DAGs produce a correlation between waffle & divorce even w/o causal ⇒ confounding relationship w/ waffles & divorce