Week 7: Causality

WEEK 6: Causality I 11/07

First half of the course

Using models to estimate relationship between x and y ⇒ is this relationship possible “casual”
- How do we know if X causes Y? ⇒ are the estimates causal? BIG QUESTION: causal inference
- Example: Do peace-keepers reduce conflict in countries emerging from civil war or are they ineffective?
- “Why questions” typically causal
Causality: X causes Y if an intervention that changes the value of X produces a probabilistic change in Y
- Intervention: X is being changed or altered
- Probabilistic: Y should change on average but not in every instance

Example:

Aspirin causes a reduction in fever symptoms

Intervention: someone takes aspirin, in their food, etc
Outcome: Taking aspirin doesn’t work 100% of the time but in general…

The problem with causality

Not obvious: Positive correlation w/ education & earnings
- Ppl alr wealthy ⇒ get more education ⇒ cycle repeats (self-select)
Not obvious: voter ID & turnout
- Comparing diff states w diff laws therefore not comparable and relationship might not exist
Spurious correlation: correlation is common in nature, how do we observe correlation?
- Can’t directly observe a change in X causing a change in Y, all we can see is correlations between X and Y
Therefore: simulate data to cement a correlation = causation
- Using rnorm() which draws random numbers from a norm distribution

Example:

rnorm(n = 4, mean = -10, sd = 100)

N = number of draws, mean = average, sd = standard deviation

Faking (simulating) treatment

Example:

fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5))

fake_election

Making the outcome

Let’s say parties raise ~20k on average +/- 4k

Example:

fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5),

funding = rnorm(n = 500, mean = 20000, sd = 4000))

fake_election

For every % of vote you get, $20k more; causal effect of vote share on treatment variable (outcome)

Example:

fake_election = tibble(party_share = rnorm(n = 500, mean = 50, sd = 5),

funding = rnorm(n = 500, mean = 20000, sd = 4000) + 2000 * party_share)

fake_election

Fundamental Problem w Causality

In theory: take a country, look at when democracy/not democracy ⇒ compare wars involved in & take average ⇒ see if demo = less war

Compare two possible scenarios, canada vs china vs usa ⇒ potential outcomes framework
We can only observe the countries in scenarios at a given time; and observe how many wars are involved in as a democracy but we have no idea for non democracy for USA b/c it has never not had a democracy…

Only observe potential outcomes for units we are interested in… we cannot observe non-unit values ⇒ missing data LOL

What would have happened = counterfactual ⇒ make a good guess as possible as to what Y (usa, china) wouldve been 0, 1

Comparing Apples and Oranges

Why not compare the number of wars for countries where democracy = 0/1; if democracies fight less then democracies are usually in less war

Implicit saying autocracies and democracies are good counterfactuals; USA and China if democracies and autocracies switched they would look the same ⇒ no

Why experiments work

Randomly expose participants to some treatment while others are exposed to nothing (placebo)
- Can’t observe the same person seeing and not seeing the ad ⇒ since it was random, they are good counterfactuals of one another

War ⇒ casualties ⇒ support
War ⇒ casualties ⇒ support

Still blocking the entire effect of war on support? ⇒ Some of the effect but not all; some scenarios do that (maybe: act of declaring war effect on public support?) if adjusting for casualties ⇒ blocking the indirect path

Causal Diagram

Causal model how the data came to be (data generating process)

Model tells us how to ID a causal effect

Directed Acyclical Graphs (DAGs)

A modeling tool for thinking ab causality;

Nodes: points = variables
Edges; arrows = direction of causality

Identification

DAGs help figure out how to estimate the effect of one variable on another
Need to adjust/control for the other variables ⇒ process is “identification” to identify X on Y

Example: Waffle house causing divorce? NO but there is a correlation… what is driving the relationship?

Lurking variable which is other factors that matter

Lurking Variable

Example: The south! It has many waffle house locations & may effect divorces but waffles do not cause divorce!

DAGs produce a correlation between waffle & divorce even w/o causal ⇒ confounding relationship w/ waffles & divorce