Chapter 15

Instrumental Variables Estimation and Two Stage Least Squares


Slide 1 — Chapter Topic: Instrumental Variables and 2SLS

This chapter is about what to do when OLS fails because of endogeneity.

The standard regression model is:
y=β0+β1x+uy = \beta_0 + \beta_1 x + u

OLS works well only if:
Cov(x,u)=0Cov(x,u)=0

That means the explanatory variable (x) must be unrelated to the error term.

But in many real-world economics problems:
Cov(x,u)0Cov(x,u)\neq 0

This is endogeneity.

When endogeneity exists, OLS is biased and inconsistent.


Simple intuition

OLS is like trying to measure the pure effect of (x) on (y).

But if (x) is mixed with hidden factors inside the error term, OLS cannot separate:

  • the true effect of (x)

  • the effect of hidden omitted variables

  • reverse causality

  • measurement error

Instrumental variables help by finding a source of variation in (x) that is “clean.”


Slide 2 — Instrumental Variables Method

The slide says endogeneity is common in social sciences and economics. Important personal variables are often unobserved, and these unobserved variables may be correlated with observed explanatory variables. Measurement error can also create endogeneity.


1. What is endogeneity?

Endogeneity means an explanatory variable is correlated with the error term.
Cov(x,u)0Cov(x,u)\neq 0

If this happens, OLS does not estimate the true causal effect.


Why does this happen?

There are three major reasons.


A. Omitted variable bias

Suppose we estimate:
Wage=β0+β1Education+uWage = \beta_0 + \beta_1 Education + u

But ability is omitted.

True model:
Wage=β0+β1Education+β2Ability+vWage = \beta_0 + \beta_1 Education + \beta_2 Ability + v

If ability is not included, it goes into the error term:
u=β2Ability+vu = \beta_2 Ability + v

If ability is correlated with education:
Cov(Education,u)0Cov(Education,u)\neq 0

OLS is biased.


Intuition

People with more ability may get more education and earn higher wages.

OLS may attribute the wage increase to education, when some of it is actually ability.

So OLS overstates the return to education.


B. Measurement error

Suppose true income affects consumption:
Consumption=β0+β1Income+uConsumption = \beta_0 + \beta_1 Income^* + u

But we observe income with error:
Income=Income+measurement errorIncome = Income^* + measurement\ error

Using mismeasured income can make (Income) correlated with the regression error.

This often biases OLS toward zero, especially in classical measurement error.


Analogy

Imagine trying to measure how speed affects travel time, but your speedometer is inaccurate.

Your explanatory variable is noisy.

OLS no longer sees the true relationship clearly.


C. Simultaneity / reverse causality

Example:
Crime=β0+β1Police+uCrime = \beta_0 + \beta_1 Police + u

Does more police reduce crime?

Problem:

  • more police may reduce crime

  • but high crime may cause cities to hire more police

So causality runs both ways.
PoliceCrimePolice \leftrightarrow Crime

Then police is endogenous.


2. Previous solutions mentioned in the slide

The slide says earlier solutions included:

Proxy variables

Use a proxy for omitted variable.

Example:

If ability is omitted in wage equation, use IQ score as proxy.

But proxies are imperfect.


Fixed effects

Use panel data to remove time-constant omitted variables.

Example:

If ability is constant over time, FE can remove it.

But FE only works if:

  1. panel data are available

  2. endogeneity is time-constant

  3. regressors vary over time

If education does not change much over time, FE cannot estimate its effect well.


3. Why IV is important

Instrumental Variables is one of the most famous methods for dealing with endogeneity.

The idea is:

Find a variable (z) that moves (x), but affects (y) only through (x).

That variable (z) is called an instrument.


Simple analogy for IV

Suppose you want to know whether coffee causes productivity.

Problem:

People drink more coffee when they are tired.

Tiredness affects productivity.

So coffee is endogenous.

A possible instrument might be:

random variation in distance to a coffee shop

If people closer to coffee shops drink more coffee, but distance to coffee shop does not directly affect productivity except through coffee, then it may be an instrument.

The instrument creates “clean” variation in coffee drinking.


Slide 3 — Motivation: Omitted Variables in a Simple Regression Model

The slide defines an instrumental variable and gives the wage-education example.


1. Simple model

Suppose:
y=β0+β1x+uy = \beta_0 + \beta_1 x + u

where (x) is endogenous:
Cov(x,u)0Cov(x,u)\neq 0

OLS fails.

We need a variable (z), the instrument.


2. Conditions for an instrumental variable

The slide gives three conditions:

Condition 1: It does not appear in the regression

The instrument (z) should not directly belong in the outcome equation.

Meaning:
y=β0+β1x+uy = \beta_0 + \beta_1x + u

does not include (z) as a direct explanatory variable.


Intuition

(z) should affect (y) only through (x).

If (z) directly affects (y), then it belongs in the equation and cannot be excluded.

This is called the exclusion restriction.


Condition 2: It is correlated with the endogenous variable


Cov(z,x)0Cov(z,x)\neq 0

This is called instrument relevance.

The instrument must actually move (x).

If (z) has no relationship with (x), it cannot help.


Analogy

If (x) is a car and you want to move it, (z) must be connected to the engine.

A weak or disconnected instrument cannot move (x).


Condition 3: It is uncorrelated with the error term


Cov(z,u)=0Cov(z,u)=0

This is called instrument exogeneity.

The instrument must not be related to hidden factors affecting (y).

This is the hardest condition to prove.


3. Wage and education example

Regression:
Wage=β0+β1Education+uWage = \beta_0 + \beta_1 Education + u

Problem:

Ability is in (u).

If ability is correlated with education:
Cov(Education,u)0Cov(Education,u)\neq 0

OLS biased.

Need an instrument (z) for education.

Good (z) must:

  1. affect education

  2. not directly affect wage

  3. not be correlated with ability or other hidden wage determinants


4. Core intuition

The instrument isolates the part of education that is not chosen because of ability.

Instead of using all variation in education, IV uses only the part of education predicted by (z).

That is the clean part.


Slide 4 — Reconsidering OLS Consistency

The slide gives a simple consistency proof for OLS under exogeneity and says OLS is consistent basically if and only if exogeneity holds.


1. OLS slope formula

In simple regression:
β^1=(xixˉ)(yiyˉ)(xixˉ)2\hat\beta_1 = \frac{\sum (x_i-\bar x)(y_i-\bar y)}{\sum (x_i-\bar x)^2}

In population terms:
β^1Cov(x,y)Var(x)\hat\beta_1 \rightarrow \frac{Cov(x,y)}{Var(x)}

Now substitute:
y=β0+β1x+uy = \beta_0 + \beta_1x + u

Then:
Cov(x,y)=Cov(x,β0+β1x+u)Cov(x,y)=Cov(x,\beta_0+\beta_1x+u)
Cov(x,y)=β1Var(x)+Cov(x,u)Cov(x,y)=\beta_1Var(x)+Cov(x,u)

So:
Cov(x,y)Var(x)\frac{Cov(x,y)}{Var(x)} =β1+Cov(x,u)Var(x)\beta_1 + \frac{Cov(x,u)}{Var(x)}

Therefore:
plim(β^1)plim(\hat\beta_1) = β1+Cov(x,u)Var(x)\beta_1 + \frac{Cov(x,u)}{Var(x)}


2. What this means

OLS is consistent only if:
Cov(x,u)=0Cov(x,u)=0

If:
Cov(x,u)0Cov(x,u)\neq 0

OLS converges to the wrong value.

The bias/inconsistency term is:
Cov(x,u)Var(x)\frac{Cov(x,u)}{Var(x)}


3. Deep intuition

OLS measures how (y) moves when (x) moves.

But if (x) also moves with hidden factors in (u), then OLS cannot know whether (y) changed because of:

  • (x)

  • the hidden factor

OLS mixes both.


Example

True effect of education on wage:
β1=0.08\beta_1 = 0.08

But if education is positively correlated with ability, and ability raises wage:
Cov(Education,u)>0

Then:
plim(\hat\beta_1)>0.08

OLS overestimates.


4. Why the Law of Large Numbers matters

The slide mentions sample variances and covariances converge to theoretical counterparts as (n).(n\to\infty).

This means with large samples:
sample covariancepopulation covariancesample\ covariance \rightarrow population\ covariance

So if the population covariance between (x) and (u) is not zero, large sample size will not fix the problem.


Important exam point

Endogeneity is not a small sample problem.

Even with infinite data, OLS is wrong if:
Cov(x,u)0Cov(x,u)\neq 0

That is why we need IV.


Slide 5 — Assume Existence of an Instrumental Variable (z)

This slide introduces the IV estimator mathematically.


1. We have endogenous (x)

Model:
y=β0+β1x+uy = \beta_0 + \beta_1x + u

OLS fails because:
Cov(x,u)0Cov(x,u)\neq 0

Now suppose we have an instrument (z) such that:
Cov(z,u)=0Cov(z,u)=0

and
Cov(z,x)0Cov(z,x)\neq 0


2. Use covariance with (z)

Start with:
y=β0+β1x+uy = \beta_0 + \beta_1x + u

Take covariance with (z):
Cov(z,y)=Cov(z,β0+β1x+u)Cov(z,y)=Cov(z,\beta_0+\beta_1x+u)
Cov(z,y)=β1Cov(z,x)+Cov(z,u)Cov(z,y)=\beta_1Cov(z,x)+Cov(z,u)

Because (Cov(z,u)=0)(Cov(z,u)=0) :
Cov(z,y)=β1Cov(z,x)Cov(z,y)=\beta_1Cov(z,x)

So:
β1=Cov(z,y)Cov(z,x)\beta_1 = \frac{Cov(z,y)}{Cov(z,x)}

This is the IV estimand.


3. Compare OLS vs IV

OLS uses:
Cov(x,y)Var(x)\frac{Cov(x,y)}{Var(x)}

IV uses:
Cov(z,y)Cov(z,x)\frac{Cov(z,y)}{Cov(z,x)}

OLS asks:

How does (y) move with (x)?

IV asks:

How does (y) move with the part of (x) moved by (z)?


4. Intuition

If (z) is clean, then changes in (x) caused by (z) are also clean.

So IV estimates the effect of (x) using only variation in (x) that comes from (z).


5. Analogy

Imagine (x) is muddy water.

OLS drinks the whole glass.

IV uses a filter (z) to extract only the clean part of (x).

The clean part is variation in (x) unrelated to the error term.


6. Key danger

If (z) is correlated with (u), IV also fails.

Bad instrument = bad estimate.

IV is only as good as the instrument.


Slide 6 — Example: Father’s Education as an IV for Education

The slide discusses father’s education as an instrument for education in a wage equation.


1. Wage equation

We want:
Wagei=β0+β1Educationi+uiWage_i = \beta_0 + \beta_1 Education_i + u_i

But education is endogenous because of ability.

Ability affects both:

  • education

  • wage


2. Proposed instrument


z=Fathers Educationz = Father's\ Education

This should affect the child’s education.

Why?

Parents with more education may:

  • value education more

  • provide academic support

  • know how to navigate schooling

  • have resources

  • encourage college attendance

So relevance is likely:
Cov(FatherEducation, Education)>0


3. Exclusion restriction

For father’s education to be valid, it must not directly affect the child’s wage except through the child’s education.

That means:
Cov(FatherEducation,u)=0Cov(FatherEducation,u)=0

But this is questionable.


4. Why it may fail

Father’s education may be related to:

  • family income

  • social networks

  • neighborhood quality

  • child ability

  • inherited traits

  • school quality

  • cultural capital

These factors can directly affect wage.

Then father’s education is correlated with the wage error term.

So it may not be exogenous.


5. Important lesson

An instrument can be relevant but still invalid.

Father’s education likely predicts education, but may not satisfy exclusion.


6. Exam phrasing

Father’s education is a plausible instrument for education because it is correlated with the respondent’s education. However, its validity is questionable because it may be correlated with unobserved family background or ability that directly affects wages.


Slide 7 — Other IVs for Education

The slide lists three instruments used in the literature:

  1. number of siblings

  2. college proximity at age 16

  3. month of birth

Each is evaluated using IV conditions.


A. Number of siblings as IV

Idea:
z=number of siblingsz = number\ of\ siblings

For education:
Education=f(siblings)Education = f(siblings)


Relevance

More siblings may reduce educational attainment because family resources are spread across more children.

So:
Cov(siblings,education)0Cov(siblings, education)\neq 0

Usually expected negative.


Exclusion restriction

Number of siblings should not directly affect wage except through education.

This is debatable.


Possible violation

Number of siblings may be correlated with:

  • family income

  • parental preferences

  • religion

  • culture

  • birth order

  • neighborhood

  • parental ability

These may affect wages directly.

So validity is not guaranteed.


B. College proximity at age 16

Idea:
z=distance to collegez = distance\ to\ college

or whether person lived near a college at age 16.


Relevance

If a person lived near a college, attending college was easier and cheaper.

So proximity increases education.
Cov(proximity,education)0Cov(proximity, education)\neq 0


Exclusion restriction

Living near a college at age 16 should not directly affect adult wages except through education.

This may be more plausible than father’s education, but still questionable.


Possible violation

College proximity may be correlated with:

  • urban location

  • better labor markets

  • better schools

  • higher-income neighborhoods

  • parental choices

These can directly affect wages.

So researchers often control for region, urban status, family background, etc.


C. Month of birth

This is a famous instrument.

Idea:

Because of compulsory schooling laws, students born in different months may be allowed to leave school at different completed education levels.


Relevance

Month of birth affects years of schooling through school entry and dropout laws.

So:
Cov(month of birth,education)0Cov(month\ of\ birth, education)\neq 0


Exclusion restriction

Month of birth should not directly affect wage except through education.

This seems plausible because birth month is close to random.


Possible concerns

Even month of birth may not be perfect.

It may be related to:

  • seasonality in parental planning

  • health at birth

  • school starting age effects

  • relative age in class

But it is often considered more credible than family background instruments.


Slide 8 — Properties of IV with a Poor Instrumental Variable

The slide says IV may be much more inconsistent than OLS if the instrument is not completely exogenous and only weakly related to (x).

This is a crucial warning.


1. Two ways an instrument can be poor

Weak instrument


Cov(z,x)Cov(z,x)

is small.

The instrument barely predicts (x).


Invalid instrument


Cov(z,u)0Cov(z,u)\neq 0

The instrument is correlated with the error term.


2. Why weak instruments are dangerous

Recall:
βIV=Cov(z,y)Cov(z,x)\beta_{IV} = \frac{Cov(z,y)}{Cov(z,x)}

If:
Cov(z,x)Cov(z,x)

is very small, then the denominator is tiny.

Tiny denominator means estimates become unstable.

Small violations of exogeneity can create huge bias.


3. Analogy

A weak instrument is like a weak flashlight in a dark room.

It gives just enough light to make you think you see something, but not enough to trust what you see.


4. Why IV can be worse than OLS

OLS may be biased because:
Cov(x,u)0Cov(x,u)\neq 0

But IV can be even worse if:

  1. (z) is weakly related to (x), and

  2. (z) is even slightly related to (u)

Because the IV estimator divides by:
Cov(z,x)Cov(z,x)

If that is tiny, the bias gets magnified.


5. Deep intuition

IV throws away a lot of variation in (x).

It uses only the part of (x) explained by (z).

If (z) explains very little of (x), then IV is relying on a tiny amount of variation.

That makes estimates noisy and fragile.


6. Example

Suppose college proximity barely affects education.

Then using college proximity as IV gives little useful variation.

If college proximity is also slightly related to local labor markets, IV estimate can be badly biased.


7. Exam takeaway

A valid IV must be both:

  1. relevant

  2. exogenous

Weak or invalid instruments can make IV worse than OLS.


First Half Big Picture

Slides 1–8 teach the motivation for IV.

OLS fails when:
Cov(x,u)0Cov(x,u)\neq 0

IV solves this by using a variable (z) that:

  1. affects (x)

  2. does not directly affect (y)

  3. is uncorrelated with (u)

The simple IV estimand is:
βIV=Cov(z,y)Cov(z,x)\beta_{IV}=\frac{Cov(z,y)}{Cov(z,x)}

But IV can be dangerous if the instrument is weak or not truly exogenous.


High-Yield Exam Statements

  1. Endogeneity means (Cov(x,u)\neq 0).

  2. OLS is inconsistent under endogeneity.

  3. An instrument must be relevant and exogenous.

  4. Relevance means (Cov(z,x)\neq 0).

  5. Exogeneity means (Cov(z,u)=0).

  6. Exclusion restriction means (z) affects (y) only through (x).

  7. Weak instruments can make IV estimates unstable and badly biased.

  8. IV estimates the causal effect using only variation in (x) induced by (z).


Deep Notes for Slides 9–16


Slide 9: IV in Multiple Regression

1. What problem are we solving?

In normal OLS, we estimate:
y=β0+β1x1+β2x2+uy = \beta_0 + \beta_1x_1 + \beta_2x_2 + u
OLS works well only if:
Cov(x1,u)=0Cov(x_1,u)=0

That means (x_1) should not be related to unobserved factors inside the error term.

But if:
Cov(x1,u)0Cov(x_1,u)\neq 0

then (x_1) is endogenous.

That means OLS is biased and inconsistent.


Simple meaning of endogeneity

Endogeneity means:

The explanatory variable is mixed up with hidden factors that also affect the dependent variable.

Example:

You want to estimate:
wage=β0+β1education+uwage = \beta_0 + \beta_1 education + u

But ability is not observed.

Ability affects wage.

Ability may also affect education.

So ability is inside (u), and education is correlated with (u).

That creates endogeneity.


2. Why multiple regression makes IV more complicated

In simple regression, we may only have:
y=β0+β1x+uy = \beta_0 + \beta_1x + u

But in multiple regression, we have:
y=β0+β1x1+β2x2+β3x3+uy = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + u

Some variables may be fine.

Some may be endogenous.

Example:
wage=β0+β1education+β2experience+β3gender+uwage = \beta_0 + \beta_1 education + \beta_2 experience + \beta_3 gender + u

Here:

  • education may be endogenous

  • experience may be exogenous

  • gender may be exogenous

So IV is mainly needed for the problematic variable.


3. Conditions for a valid instrument

The slide gives 3 conditions.

Suppose (x_1) is endogenous.

We need an instrument (z).

A valid instrument must satisfy:


Condition 1: Instrument does not appear in the regression equation

This means (z) is not directly included as an explanatory variable in the main model.

Main model:
wage=β0+β1education+β2experience+uwage = \beta_0 + \beta_1 education + \beta_2 experience + u

If (z = nearc4), meaning living near a college at age 16, then (nearc4) is not supposed to directly determine wage.

It is only used to explain education.


Intuition

The instrument is like a helper variable.

It helps us isolate the clean part of education.

But it should not directly belong in the wage equation.


Condition 2: Instrument is uncorrelated with the error term


Cov(z,u)=0Cov(z,u)=0

This is called instrument exogeneity.

This is the most important and hardest condition.

It means:

The instrument should not be related to omitted factors affecting (y).

Example:

If (z = nearc4), we need:
Cov(nearc4,ability)=0Cov(nearc4, ability)=0

Why?

Because ability is inside the error term.

If living near college is related to family background, parental income, neighborhood quality, or ability, then it may not be exogenous.


Very important exam point

You usually cannot fully prove instrument exogeneity using data.

You must defend it using logic, theory, or institutional background.


Condition 3: Instrument is partially correlated with endogenous explanatory variable


Cov(z,x1)0Cov(z,x_1) \neq 0

This is called instrument relevance.

In multiple regression, we say partially correlated because the instrument must explain (x_1) after controlling for other exogenous variables.

Example:
education=π0+π1nearc4+π2experience+π3gender+veducation = \pi_0 + \pi_1 nearc4 + \pi_2 experience + \pi_3 gender + v

For (nearc4) to be relevant:
π10\pi_1 \neq 0

It must help predict education even after controlling for the other variables.


Big analogy

Imagine education is dirty water.

It contains two parts:

  1. Clean part: variation caused by valid external factors

  2. Dirty part: variation caused by ability, family background, motivation, etc.

OLS uses the whole dirty water.

IV tries to filter the water.

The instrument is the filter.

But the filter must satisfy two things:

  • It should actually filter education

  • It should not add new dirt

That is:

  • relevance

  • exogeneity


Exam-ready answer

A valid instrument must satisfy two main requirements:

First, it must be relevant, meaning it is correlated with the endogenous explanatory variable after controlling for other exogenous variables. Second, it must be exogenous, meaning it is uncorrelated with the structural error term. It should affect the dependent variable only through the endogenous regressor, not directly.


Slide 10: Computing IV Estimates in Multiple Regression

This slide is about how IV works mathematically when we have more than one regressor.


1. The setup

Suppose the structural equation is:
y1=β0+β1y2+β2z1+u1y_1 = \beta_0 + \beta_1y_2 + \beta_2z_1 + u_1

Here:

  • (y1y_1 ) = dependent variable

  • (y2y_2 ) = endogenous explanatory variable

  • (z1z_1 ) = exogenous control variable

  • (u1u_1 ) = error term

We need an instrument for (y2y_2 ).

Let the instrument be (z2z_2 ).

So:

  • (z1z_1 ) appears in the regression as a normal control

  • (z2z_2 ) is excluded from the regression and used as an instrument


2. Why not just run OLS?

OLS estimates:
y1=β0+β1y2+β2z1+u1y_1 = \beta_0 + \beta_1y_2 + \beta_2z_1 + u_1

But if:
Cov(y2,u1)0Cov(y_2,u_1)\neq0

then OLS cannot separate the true effect of (y_2) from the hidden omitted factor.


Example

Suppose:
wage=β0+β1education+β2experience+uwage = \beta_0 + \beta_1 education + \beta_2 experience + u

The error term includes ability.

If ability affects both education and wage, then education is endogenous.

OLS may wrongly attribute the effect of ability to education.

So OLS coefficient on education may be biased upward.


3. What IV does conceptually

IV does not use all variation in (y_2).

It only uses the variation in (y_2) that comes from the instrument.

In words:

IV asks: among people whose education changed because of the instrument, how did wages change?

That is why IV is often interpreted as using a cleaner source of variation.


4. The key idea in multiple regression

The instrument must explain the endogenous variable after accounting for the other controls.

So the first-stage relationship is:
y2=π0+π1z1+π2z2+v2y_2 = \pi_0 + \pi_1z_1 + \pi_2z_2 + v_2

If (z2z_2 ) is a good instrument, then:
π20\pi_2 \neq 0

Meaning (z_2) predicts (y2y_2 ).


5. What “excluded instrument” means

An excluded instrument is a variable that:

  • is included in the first stage

  • is excluded from the second stage/main structural equation

Example:

Main equation:
wage=β0+β1education+β2experience+uwage = \beta_0 + \beta_1education + \beta_2experience + u

First stage:
education=π0+π1nearc4+π2experience+veducation = \pi_0 + \pi_1nearc4 + \pi_2experience + v

Here:

  • (experience) is an included exogenous variable

  • (nearc4) is an excluded instrument


Common exam trap

Students often think every (z) is an instrument.

But in Wooldridge notation, (z) can mean exogenous variables generally.

Some (z)'s are included controls.

Some are excluded instruments.

For IV, the special instrument is the excluded variable that helps predict the endogenous variable.


Slide 11: Two Stage Least Squares

This is one of the most important slides.

2SLS is the practical method used for IV estimation in multiple regression.


1. Why is it called Two Stage Least Squares?

Because we run two regressions:

  1. First-stage regression

  2. Second-stage regression

Both stages use least squares.


2. First stage

Suppose the structural equation is:
y1=β0+β1y2+β2z1+u1y_1 = \beta_0 + \beta_1y_2 + \beta_2z_1 + u_1

where (y_2) is endogenous.

First stage:
y2=π0+π1z1+π2z2+v2y_2 = \pi_0 + \pi_1z_1 + \pi_2z_2 + v_2

Here we regress the endogenous variable (y_2) on:

  • all exogenous variables in the model

  • all instruments

Then we get predicted values:
y^2\hat{y}_2


What does (y^2)\hat{y}_2) mean?

(y^2\hat{y}_2 ) is the predicted part of (y_2) explained only by exogenous information.

It is the “clean” part of (y_2).


Example

Main equation:
lwage=β0+β1educ+β2exper+ulwage = \beta_0 + \beta_1educ + \beta_2exper + u

Education is endogenous.

Instrument: near college.

First stage:
educ=π0+π1nearc4+π2exper+veduc = \pi_0 + \pi_1nearc4 + \pi_2exper + v

Then get:
educ^\widehat{educ}

This predicted education is the part of education explained by college proximity and experience.


3. Second stage

Now replace actual (y_2) with predicted (y^2\hat{y}_2 ):
y1=β0+β1y^2+β2z1+errory_1 = \beta_0 + \beta_1\hat{y}_2 + \beta_2z_1 + error

Example:
lwage=β0+β1educ^+β2exper+errorlwage = \beta_0 + \beta_1\widehat{educ} + \beta_2exper + error

Now the coefficient on (\widehat{educ}) is the 2SLS estimate of the effect of education.


4. Why do we not use actual education in stage 2?

Because actual education contains both:
educ=clean part+endogenous parteduc = clean\ part + endogenous\ part

The endogenous part may be related to ability, motivation, family background, etc.

But predicted education from the first stage uses only exogenous variation.

So it removes the contaminated part.


5. Very simple analogy

Suppose you want to know whether studying causes higher grades.

But study hours are endogenous because motivated students both study more and score higher.

So study hours are contaminated by motivation.

Now suppose library distance affects study hours but not grades directly.

People who live closer to the library study more.

So you use library distance as an instrument.

First stage:
studyhours=f(librarydistance,controls)studyhours = f(librarydistance, controls)

Second stage:
grades=f(predicted studyhours,controls)grades = f(predicted\ studyhours, controls)

Now you are using the study hours caused by library distance, not the study hours caused by motivation.


6. Important warning

In practice, you should not manually run stage 1 and stage 2 using normal OLS and report the second-stage standard errors.

Why?

Because standard OLS software in the second stage does not know that (\hat{y}_2) was estimated.

So standard errors will be wrong.

Use IV/2SLS commands in software.


Exam-ready answer

2SLS works in two steps. First, the endogenous explanatory variable is regressed on all exogenous variables and instruments. This gives predicted values that contain only exogenous variation. Second, the dependent variable is regressed on the predicted endogenous variable and other exogenous controls. This produces an IV estimate that removes the part of the endogenous regressor correlated with the error term.


Slide 12: Why Does 2SLS Work?

This slide explains the intuition.


1. Main idea

2SLS works because the second-stage variable (y^2\hat{y}_2 ) is constructed only from exogenous information.

So unlike (y2y_2 ), it should not be correlated with the error term.


2. Actual endogenous variable

Actual (y_2) is problematic because:
y2=part explained by instruments+part related to errory_2 = part\ explained\ by\ instruments + part\ related\ to\ error

The second part is the problem.

OLS uses both parts.

2SLS uses only the first part.


3. What does “purged” mean?

The slide says (y_2) is purged of its endogenous part.

Purged means cleaned.

So:
y2=y^2+v^2y_2 = \hat{y}_2 + \hat{v}_2

where:

  • (y^2\hat{y}_2 ) = predicted clean component

  • (v^2\hat{v}_2 ) = leftover component

If endogeneity comes from the leftover component, 2SLS removes it by using only (\hat{y}_2).


4. Example with education and wage

Actual education is affected by many things:
education=college proximity+parental income+ability+motivation+luckeducation = college\ proximity + parental\ income + ability + motivation + luck

Some of these are dangerous:

  • ability

  • motivation

  • parental support

These may also affect wage.

So OLS confuses the effect of education with these hidden factors.

But if college proximity is a valid instrument, then:
education^\widehat{education}

uses only the part of education explained by college proximity and controls.

That gives a cleaner estimate.


5. Deep intuition

OLS asks:

Do people with more education earn more?

But this is not necessarily causal.

Because educated people may be different in many other ways.

IV asks:

Do people who got more education because of the instrument earn more?

That is closer to causal.


6. Important IV interpretation

IV estimates the effect for the group whose treatment/explanatory variable was affected by the instrument.

In causal language, this is often called a local effect.

Example:

College proximity may affect education mainly for people who are on the margin of attending college.

So IV estimates the return to education for people whose schooling decision changed because they lived near a college.


7. Exam trap

2SLS does not magically solve endogeneity.

It only works if the instrument is valid.

If the instrument is correlated with the error term, IV can be even worse than OLS.


Slide 13: Properties of 2SLS

This slide gives important technical properties.


Property 1: Second-stage OLS standard errors are wrong

Suppose you manually do this:

Stage 1:
educ=π0+π1nearc4+π2exper+veduc = \pi_0 + \pi_1nearc4 + \pi_2exper + v

Get:
educ^\widehat{educ}

Stage 2:
lwage=β0+β1educ^+β2exper+errorlwage = \beta_0 + \beta_1\widehat{educ} + \beta_2exper + error

If you run normal OLS in stage 2, the coefficient is okay.

But the standard error is wrong.


Why?

Because (educ^\widehat{educ} ) is generated from a previous regression.

There is extra uncertainty from estimating the first stage.

Normal OLS standard errors ignore that.

So use built-in IV regression commands.

Examples:

In R:

ivreg(lwage ~ educ + exper | nearc4 + exper)

In Stata:

ivregress 2sls lwage exper (educ = nearc4)

Property 2: One endogenous variable + one instrument

If there is:

  • one endogenous variable

  • one instrument

then:
2SLS = IV

They give the same estimate.


Property 3: More than one endogenous variable

2SLS can handle more than one endogenous variable.

Example:
sales=β0+β1price+β2advertising+usales = \beta_0 + \beta_1price + \beta_2advertising + u

Both price and advertising may be endogenous.

Then we need instruments for both.


Identification rule

This is very exam-important.

If:
# instruments < # endogenous variables

The model is underidentified.

You cannot estimate the causal effects.


If:
# instruments = # endogenous variables

The model is exactly identified.

You can estimate it, but cannot test overidentifying restrictions.


If:
# instruments > # endogenous variables

The model is overidentified.

You can estimate it and test whether instruments as a group seem valid.


Example

Suppose endogenous variables:

  1. education

  2. job training

You need at least 2 excluded instruments.

Possible instruments:

  1. distance to college

  2. eligibility for training subsidy

If you only have one instrument, you cannot separately identify both effects.


Analogy

Imagine you have two unknowns:
x + y = 10

You cannot solve for both x and y with one equation.

You need at least two independent equations.

Similarly, if you have two endogenous variables, you need at least two instruments.


Slide 14: Example — Wage Equation Using Two Instruments

This slide applies 2SLS to a wage equation.


1. Why wage equations are common in IV

Education and wage is a classic IV example.

The causal question:

What is the return to one additional year of education?

OLS model:
lwage=β0+β1educ+β2exper+β3exper2+ulwage = \beta_0 + \beta_1educ + \beta_2exper + \beta_3exper^2 + u

Here:

  • (lwage) = log wage

  • (educ) = years of education

  • (exper) = work experience

  • (exper^2) = nonlinear experience effect


2. Why education may be endogenous

Education may be correlated with:

  • ability

  • motivation

  • family background

  • school quality

  • ambition

  • parental support

These factors also affect wage.

So they are hidden inside (u).

Therefore:
Cov(educ,u)0Cov(educ,u)\neq0

OLS is likely biased.


3. Using instruments

Suppose we use two instruments:

  • father’s education

  • mother’s education

or:

  • near college

  • number of siblings

The first stage could be:
educ=π0+π1motheduc+π2fatheduc+π3exper+π4exper2+veduc = \pi_0 + \pi_1motheduc + \pi_2fatheduc + \pi_3exper + \pi_4exper^2 + v

Then get:
educ^\widehat{educ}

Second stage:
lwage=β0+β1educ^+β2exper+β3exper2+errorlwage = \beta_0 + \beta_1\widehat{educ} + \beta_2exper + \beta_3exper^2 + error


4. How to interpret coefficient in log wage model

If dependent variable is log wage:
lwage=β0+β1educ+...lwage = \beta_0 + \beta_1educ + ...

Then (β1\beta_1 ) is approximately the percentage change in wage from one more year of education.

Example:
β^1=0.08\hat{\beta}_1 = 0.08

Interpretation:

One more year of education increases wage by about 8%, holding other variables constant.

If:
β^1=0.12\hat{\beta}_1 = 0.12

Interpretation:

One more year of education increases wage by about 12%.


5. Why use two instruments instead of one?

Using more instruments can help predict education better.

A stronger first stage can improve precision.

But more instruments are not always better.

Every instrument must be valid.

A bad instrument can contaminate the estimate.


6. First-stage importance

In the first stage, we check whether instruments significantly predict education.

If instruments are weak, 2SLS becomes unreliable.

A common rule:

First-stage F-statistic should usually be greater than 10.

If F-statistic is very small, instrument is weak.


7. Exam-ready interpretation

If 2SLS estimate of education is larger than OLS, you can say:

The IV estimate suggests a larger return to education than OLS. This may happen if OLS suffers from measurement error or if the instrument identifies returns for a subgroup with higher marginal returns.

If 2SLS estimate is smaller than OLS:

This may suggest OLS was upward biased due to omitted ability or family background.


Slide 15: IV for Measurement Error and Statistical Properties

This slide has several important ideas.


Part A: Measurement Error

1. What is measurement error?

Measurement error means the variable we observe is not the true variable.

True model:
y=β0+β1x+uy = \beta_0 + \beta_1x^* + u

But we do not observe (x^*).

Instead we observe:
x=x+ex = x^* + e

where:

  • (x^*) = true variable

  • (x) = measured variable

  • (e) = measurement error


2. Why measurement error causes endogeneity

If we regress (y) on observed (x), then the error term effectively includes measurement error.

This creates correlation between regressor and error.

That violates OLS exogeneity.


3. Classical measurement error

When the independent variable is measured with error, OLS coefficient is usually biased toward zero.

This is called:

attenuation bias

Meaning the estimated effect becomes too small.


Example

You want to estimate effect of income on consumption.

True income is hard to observe.

Survey income may contain mistakes.

Some people round income.

Some misreport.

Some forget bonuses.

Observed income is noisy.

OLS using noisy income may underestimate the true effect of income on consumption.


4. How IV helps measurement error

If you have another measurement of the same variable, you can use it as an instrument.

Example:

  • (x) = self-reported income

  • (z) = tax-record income

Use tax-record income as an instrument for self-reported income.

The second measure must be:

  • correlated with true income

  • not correlated with the measurement error in the first measure


5. Example

Suppose true education is hard to measure.

One dataset has self-reported years of education.

Another has school records.

If self-reported education has errors, school records can instrument for it.


Part B: Statistical Properties of 2SLS/IV

1. Consistency

2SLS is consistent if the instruments are valid.

Consistency means:
β^IVβ\hat{\beta}_{IV} \rightarrow \beta

as sample size becomes very large.

In simple language:

With enough data and valid instruments, IV gets close to the true causal effect.


2. Asymptotic normality

2SLS is asymptotically normal.

That means in large samples, we can use:

  • t-tests

  • confidence intervals

  • p-values

But this is mostly large-sample theory.

In small samples, IV may behave poorly, especially with weak instruments.


Part C: IV is usually less precise than OLS

This is very important.

IV fixes bias but often increases variance.

So compared to OLS, IV often has:

  • larger standard errors

  • wider confidence intervals

  • less statistical significance


Why?

OLS uses all variation in (x).

IV uses only variation in (x) explained by instruments.

That is a smaller amount of variation.


Analogy

OLS uses the entire signal, but it may be contaminated.

IV uses only the clean signal, but the clean signal may be weak.

So IV is cleaner but noisier.


Part D: Heteroskedasticity and serial correlation

The slide says corrections are analogous to OLS.

That means if errors are heteroskedastic, use robust standard errors.

If panel or time series data has serial correlation, use clustered or HAC standard errors.


Part E: IV extends to panel and time series

IV is not only for cross-sectional data.

It can be used in:

  • panel data

  • time series

  • fixed effects models

  • difference-in-differences with instruments

Example:

Panel model:
yit=β1xit+ai+uity_{it} = \beta_1x_{it} + a_i + u_{it}

If (x_{it}) is endogenous, we can combine fixed effects with IV.


Slide 16: Testing for Endogeneity

This is exam-favorite.


1. Why test for endogeneity?

IV is useful, but it is costly.

It gives larger standard errors.

If OLS is valid, OLS is usually better.

So we ask:

Do we actually need IV?


2. Null and alternative hypotheses

The endogeneity test usually tests:

Null hypothesis:


H0:x is exogenousH_0: x \text{ is exogenous}

Meaning:

Cov(x,u)=0

OLS is consistent.


Alternative hypothesis:


H1:x is endogenousH_1: x \text{ is endogenous}

Meaning:
Cov(x,u)0Cov(x,u)\neq0

OLS is inconsistent, so IV is needed.


3. Durbin-Wu-Hausman test intuition

The test compares OLS and IV.

If OLS and IV are similar, maybe OLS is fine.

If OLS and IV are very different, OLS may be biased.


Analogy

Imagine two thermometers.

OLS thermometer is accurate only if there is no endogeneity.

IV thermometer is accurate if the instrument is valid.

If both thermometers show similar temperatures, no big concern.

If they show very different temperatures, something is wrong with OLS.


4. Regression-based endogeneity test

This is very important for exams.

Suppose model:
y1=β0+β1y2+β2z1+u1y_1 = \beta_0 + \beta_1y_2 + \beta_2z_1 + u_1

where (y_2) may be endogenous.

First stage:
y2=π0+π1z1+π2z2+v2y_2 = \pi_0 + \pi_1z_1 + \pi_2z_2 + v_2

Get residuals:
v^2\hat{v}_2

Then include the residual in the original equation:
y1=β0+β1y2+β2z1+δv^2+errory_1 = \beta_0 + \beta_1y_2 + \beta_2z_1 + \delta\hat{v}_2 + error

Test:
H0:δ=0H_0: \delta = 0

If (\delta) is significant, then (y_2) is endogenous.


5. Why does this work?

The residual (v^2\hat{v}_2 ) captures the part of (y2y_2 ) that is not explained by exogenous variables and instruments.

If that leftover part explains (y1y_1 ), then (y2y_2 ) contains endogenous variation.

So if (v^2\hat{v}_2 ) is significant in the main equation, there is evidence of endogeneity.


6. Decision rule

If p-value < 0.05:

Reject (H_0).

Conclusion:

The variable appears endogenous. Use IV/2SLS.

If p-value ≥ 0.05:

Fail to reject (H_0).

Conclusion:

There is not enough evidence of endogeneity. OLS may be acceptable.


7. Important caution

Failing to reject endogeneity does not prove OLS is perfectly valid.

It only means the test did not find strong evidence of endogeneity.

This can happen if:

  • sample size is small

  • instruments are weak

  • test has low power


8. Exam-ready answer

To test whether an explanatory variable is endogenous, we can use a Durbin-Wu-Hausman test. The null hypothesis is that the variable is exogenous. One regression-based method is to regress the suspected endogenous variable on all exogenous variables and instruments, obtain the residuals, and include those residuals in the structural equation. If the residual term is statistically significant, we reject exogeneity and conclude that IV/2SLS is needed.


Master Concept: OLS vs IV vs 2SLS

OLS

Uses actual (x).

Good if:

[
Cov(x,u)=0
]

Bad if:

[
Cov(x,u)\neq0
]


IV

Uses instrument (z) to isolate clean variation in (x).

Good if:

[
Cov(z,x)\neq0
]

and

[
Cov(z,u)=0
]


2SLS

Practical IV method for multiple regression.

Stage 1:

[
x = f(z, controls)
]

Stage 2:

[
y = f(\hat{x}, controls)
]


Very Important Exam Table

Concept

Meaning

Exam Keyword

Endogeneity

X correlated with error

OLS biased

Instrument relevance

Z predicts X

First stage

Instrument exogeneity

Z unrelated to error

Exclusion restriction

2SLS stage 1

Predict endogenous variable

Reduced form

2SLS stage 2

Use predicted X

Structural equation

Weak instrument

Z barely predicts X

Large bias, bad inference

Overidentified

More instruments than endogenous variables

Can test instruments

Endogeneity test

Check if OLS is valid

Hausman/DWH


Common Exam Questions and Strong Answers

Q1. What is a valid instrument?

A valid instrument is a variable that is correlated with the endogenous explanatory variable but uncorrelated with the structural error term. It affects the dependent variable only through the endogenous variable.


Q2. Why does OLS fail with endogeneity?

OLS fails because the explanatory variable contains variation related to unobserved factors in the error term. Therefore, OLS cannot separate the causal effect of the regressor from the effect of omitted variables.


Q3. Why does 2SLS work?

2SLS works by replacing the endogenous regressor with its predicted value from a first-stage regression using only exogenous variables and instruments. This predicted value removes the part of the regressor correlated with the error term.


Q4. Why are IV standard errors usually larger than OLS?

IV uses only the variation in the endogenous regressor explained by the instruments. Since this is usually less variation than OLS uses, IV estimates are less precise and have larger standard errors.


Q5. What happens with weak instruments?

Weak instruments are only weakly correlated with the endogenous regressor. They make IV estimates unreliable, biased, and imprecise. Weak instruments can sometimes perform worse than OLS.


Q6. When is 2SLS exactly identified?

2SLS is exactly identified when the number of excluded instruments equals the number of endogenous explanatory variables.


Q7. When is 2SLS overidentified?

2SLS is overidentified when the number of excluded instruments is greater than the number of endogenous explanatory variables.


Q8. How do you test whether IV is needed?

Use a Durbin-Wu-Hausman endogeneity test. The null hypothesis is that the suspected variable is exogenous. If rejected, IV/2SLS is preferred over OLS.


Final Big Intuition

OLS asks:

Are people with more X different in Y?

IV asks:

When X changes because of an external source Z, does Y change?

That is why IV is closer to causal reasoning.

The entire chapter is about finding a clean source of variation.

In one sentence:

IV/2SLS helps estimate causal effects when the main explanatory variable is contaminated by omitted variables, measurement error, or reverse causality, by using an external instrument that affects the explanatory variable but does not directly affect the outcome.