Session 10 - additional

Absolutely. Let’s go slide by slide for Session 10 additional, and I’ll explain the concepts deeply but clearly. This file is mainly about Difference-in-Differences (DiD) and the parallel trends assumption using cholera and minimum wage examples.

Slide 1: Session 10 Additional

This is an extension of Session 10. The main goal is to understand:

How can we estimate causal effects when we cannot run a perfect randomized experiment?

The tool introduced here is:

Difference-in-Differences, also called DiD, DD, or DnD.

This method is especially useful when:

one group gets a treatment,
another group does not,
and we observe both groups before and after treatment.

Slide 2: Designing an experiment for cholera

The question is:

How would you design an experiment to find the effect of clean drinking water on contracting cholera?

In a perfect randomized experiment, we would randomly assign households to:

clean drinking water,
contaminated drinking water.

But ethically, we cannot randomly give people unsafe water.

So we need a natural experiment or quasi-experiment.

A quasi-experiment means:

The treatment was not randomly assigned by researchers, but some real-world event created treatment and control groups.

In this case, different water companies served different households. One company improved its water source, while another did not. That creates a possible comparison.

Slide 3: Cholera cases in London

The table shows cholera incidence per 10,000 households:

Company	1849	1854
Southwark and Vauxhall	135	147
Lambeth	85	19

Southwark and Vauxhall had:

135 cases in 1849,
147 cases in 1854.

Lambeth had:

85 cases in 1849,
19 cases in 1854.

The big observation:

Lambeth’s cholera cases dropped a lot from 85 to 19.

Why?

Possibly because Lambeth moved to a cleaner water source.

But we need to be careful before saying:

“Clean water caused the reduction.”

Because many other things could have changed between 1849 and 1854.

Slide 4: Bad comparison — 147 − 19

The slide shows:

Effect of clean water = 147 − 19

This compares:

Southwark and Vauxhall in 1854: 147 cases
Lambeth in 1854: 19 cases

So the difference is:

147 − 19 = 128 fewer cases

At first, we may think clean water reduced cholera by 128 cases per 10,000 households.

But this is a bad comparison.

Why?

Because Lambeth and Southwark may have already been different before clean water.

In 1849:

Southwark = 135
Lambeth = 85

So Lambeth already had fewer cholera cases before the change.

That means the 1854 difference includes:

effect of clean water,
plus pre-existing differences between the two groups.

Simple analogy:

If School A already scores higher than School B before a new teaching method, we cannot compare only after scores and say the new method caused the difference.

Slide 5: Cross-sectional comparison problem

The slide writes:

Lambeth outcome = L + D
Southwark and Vauxhall outcome = SV
Effect = D + (L − SV)

Let’s unpack this.

Here:

D = actual effect of clean water.
L − SV = original difference between Lambeth and Southwark.

So if we compare Lambeth and Southwark only after treatment, the observed difference includes two things:

Observed difference = treatment effect + baseline group difference

That is the core problem.

We want only:

But we accidentally get:

D + baseline difference

This is selection bias or baseline imbalance.

Simple example:

Suppose:

Lambeth neighborhoods were richer,
had better sanitation,
had less crowding.

Then Lambeth may have had fewer cholera cases even without clean water.

So the after-only comparison overstates or misstates the water effect.

Slide 6: Before-after comparison problem

Now the slide compares Lambeth before and after.

It says:

Before: Y = L
After: Y = L + (T + D)

So the before-after change is:

T + D

Here:

D = effect of clean water.
T = time effect.

The time effect means something else changed over time between 1849 and 1854.

Examples:

better sanitation,
better public health knowledge,
weather changes,
migration,
disease spread patterns,
general improvements in London.

So if Lambeth cases drop from 85 to 19, that drop may be due to:

clean water,
plus general time changes.

Simple analogy:

If your grades improve after buying a laptop, you cannot say the laptop caused everything. Maybe the second exam was easier, or you studied more.

So before-after comparison is also not enough.

Slide 7: Difference-in-Differences idea

This slide combines both comparisons.

It shows four cells:

Group	Before	After
Lambeth	L	L + T + D
Southwark	SV	SV + T

Now calculate changes:

For Lambeth:

After − Before = (L + T + D) − L = T + D

For Southwark:

After − Before = (SV + T) − SV = T

Now subtract:

(T + D) − T = D

So DiD removes the time effect and isolates treatment effect.

That is the beauty of DiD.

Formula:

$DID = (Lambeth_{after} - Lambeth_{before}) - (SV_{after} - SV_{before})$

Using numbers:

Lambeth change:

19 − 85 = −66

Southwark change:

147 − 135 = +12

DiD:

−66 − 12 = −78

Interpretation:

Clean water reduced cholera cases by about 78 cases per 10,000 households.

Notice this is different from the bad after-only comparison of 128.

Why?

Because DiD adjusts for:

Lambeth already being different,
general time trends.

Slide 8: Assumption behind DiD

The slide says:

T is same for both Lambeth and Southwark.

This is called the:

Parallel trends assumption

Meaning:

Without clean water, Lambeth and Southwark would have followed the same trend over time.

In our notation:

Lambeth would have changed by T.
Southwark changed by T.

So Southwark’s change is used as the counterfactual for Lambeth.

Counterfactual means:

What would have happened to Lambeth if it had not received clean water?

We cannot observe that directly, so we use Southwark as a proxy.

Simple analogy:

Imagine two runners:

Runner A gets new shoes.
Runner B does not.

If both runners were improving at the same rate before the shoes, then Runner B’s later improvement helps estimate what Runner A would have done without the new shoes.

But if Runner A was already improving faster before the shoes, the comparison is unfair.

Slide 9: Understanding parallel trends

This slide introduces the most important concept:

Parallel trends.

Parallel trends does not mean the two groups have the same level.

They can start at different levels.

What matters is:

Their trends would have moved similarly without treatment.

Example:

Good parallel trends:

Time	Treated	Control
Before 1	80	100
Before 2	90	110

Both increased by 10.

They are at different levels, but trends are parallel.

Bad parallel trends:

Time	Treated	Control
Before 1	80	100
Before 2	100	105

Treated increased by 20, control increased by 5.

Now the treated group was already moving differently.

That makes DiD unreliable.

Key sentence:

DiD can handle different starting levels, but not different underlying trends.

Slide 10: Minimum wage and employment

This slide gives another DiD example:

What is the effect of minimum wage increase on employment?

The theory is ambiguous.

One view:

Higher wages may reduce employment because businesses hire fewer workers.

Another view:

Higher wages may attract more workers and increase labor force participation.

So the effect is not obvious theoretically.

That is why it becomes an empirical question.

Empirical question means:

We need data to find the answer.

This is important because causal analytics is not about assuming the answer. It is about designing a credible comparison.

Slide 11: Card and Krueger minimum wage study

This slide describes a famous DiD study.

New Jersey increased minimum wage:

from $4.25 to $5.05,
in November 1992.

Pennsylvania did not change minimum wage:

stayed at $4.25.

Researchers surveyed fast-food restaurants:

before: February 1992,
after: November 1992.

Treatment group:

New Jersey restaurants.

Control group:

Pennsylvania restaurants.

Outcome:

employment.

DiD question:

Did employment change more in New Jersey than in Pennsylvania after the minimum wage increase?

The logic:

Pennsylvania gives us the time trend that would have happened without the minimum wage change.

New Jersey gives us the treated group.

Formula:

DID = (NJ_{after} - NJ_{before}) - (PA_{after} - PA_{before})

Slide 12: Estimating DiD using regression

The slide asks:

Which coefficient gives the DiD estimate?

Variables:

NJ = 1 for New Jersey, 0 for Pennsylvania.
D = 1 for post-period, 0 for pre-period.

Regression:

$Y = \alpha + \beta D + \gamma NJ + \delta (D \times NJ) + \epsilon$

Here:

Y = employment.
D = after period.
NJ = treated group.
D \times NJ = treated group after treatment.

The DiD estimate is:

\delta

Why?

Because D \times NJ is only 1 for the treated group after treatment.

So it captures the extra change in New Jersey after minimum wage change, beyond:

general time change,
permanent NJ vs PA difference.

Interpretation:

\delta is the estimated causal effect, assuming parallel trends.

Slide 13: Parallel trends in regression

This slide visually explains the regression.

The outcomes are:

PA pre:

$\alpha$

PA post:

$\alpha + \lambda$

NJ pre:

$\alpha + \gamma$

NJ post:

$\alpha + \gamma + \lambda + \delta$

Here:

\alpha = baseline outcome for PA before.
\lambda = general time effect from before to after.
\gamma = baseline difference between NJ and PA.
\delta = DiD treatment effect.

The key idea on the slide:

OLS estimates the counterfactual using the slope of the untreated group.

That means:

To estimate what would have happened to NJ without treatment, OLS assumes NJ would have followed PA’s trend.

So if PA employment went down, OLS assumes NJ would also have gone down similarly without the minimum wage change.

Then it compares:

actual NJ post-treatment outcome,
predicted NJ post-treatment outcome without treatment.

The vertical gap is \delta, the treatment effect.

Very important:

The entire DiD logic depends on whether PA is a good counterfactual for NJ.

Slide 14: What if treatment and control slopes are not the same?

This slide shows what happens when the trends are not parallel.

If the treatment group naturally follows a different slope than control, then DiD becomes biased.

The graph shows:

observed PA trend,
observed NJ trend,
counterfactual NJ trend.

The problem:

OLS uses PA’s trend to construct NJ’s counterfactual.

But if NJ would not have followed PA’s trend anyway, the estimated effect is wrong.

Simple analogy:

Suppose:

one student is improving rapidly,
another student is declining.

If the first student gets tutoring, you cannot use the declining student as the counterfactual.

Why?

Because even without tutoring, the first student may have improved more.

So DiD would exaggerate the tutoring effect.

Important question from the slide:

Why should we believe slopes will be same for treatment and control groups in post-period?

Answer:

We cannot prove it directly, because we do not observe the untreated version of the treated group after treatment.

But we can look at pre-treatment trends.

Slide 15: How to check parallel trends

This slide says:

If you have panel data, use it to check whether pre-trends are similar.

Panel data means:

Same units observed over multiple time periods.

For example:

same households over many weeks,
same restaurants before and after,
same regions across years.

If treatment and control had similar trends before treatment, then it is more believable that they would have continued similarly after treatment.

But important warning:

Similar pre-trends do not prove parallel post-trends.

It only makes the assumption more credible.

Why?

Because something new could happen exactly at treatment time that affects one group differently.

Example:

NJ minimum wage changes,
but at the same time NJ has a local economic shock.

Then DiD may confuse the shock with the policy effect.

Final takeaway:

Parallel trends is an assumption, not something we can fully prove.

We can only support it using evidence.

Final summary of the entire file

This file teaches Difference-in-Differences.

The problem:

Simple after-only comparison is biased because groups may differ.
Simple before-after comparison is biased because time effects may exist.

DiD solution:

DID = (Treated_{after} - Treated_{before}) - (Control_{after} - Control_{before})

It removes:

baseline group differences,
common time effects.

But DiD depends on the parallel trends assumption:

Without treatment, treated and control groups would have followed the same trend.

Regression version:

Y = \alpha + \beta Post + \gamma Treated + \delta(Post \times Treated) + \epsilon

The treatment effect is:

\delta

Most important sentence to remember:

DiD compares changes, not levels, and uses the control group’s trend as the counterfactual for the treated group.