Applied econometrics lectures 11-14 - Panel data

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/42

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

43 Terms

1
New cards

Panel data defenition

repeated observations over time for the same individual

2
New cards

Pooled cross sectional data

       Two or more similar cross-sectional datasets, each relating to a different time period 

       Variables contained in each cross section are the same

       Samples of observations are not the same (may overlap)

3
New cards

pooled vs panel

The repetition of going back to the same people is what makes the data panel - Contains a time series for each cross-sectional unit

• Pooled cross section datasets the samples in any two years are independent

• Panel datasets the samples in any two years are not independent, they are the same

4
New cards

Why is panel data the most powerful

allows us to control for unobserved (time-invariant) heterogeneity

5
New cards

Why panel data violates OLS

the error terms relating to some pairs of observations may be correlated with one another

But we can use it to our advantage

6
New cards

what happens if we ignore the time dimension with panel data

when estimating you might not be estimating what you think due to the non-independence

knowt flashcard image

We can run regressions separately to combat this

7
New cards

Possible problem with running regressions separately

Omitted variable bias (OVB) – zero conditional mean doesn’t hold

The effect of all those other variables is currently in the error term making it difficult to get an accurate estimate of the relationship we are interested in → high p values

8
New cards

Solutions to OVB with panel data

•Add more explanatory variables

•Focus on the second year and add a lagged dependent variable

•Pool across the years so we have more observations => more power

• does this fix the independence issue? - NO

•Exploit the panel and have a think explicitly about the structure of the errors

9
New cards

Error term as 2 parts

Fixed effect - ai Captures all unobserved, constant factors that affect y

  • Geographical factors + Institutional factors

Idiosyncratic effect - Uit Captures all other unobserved factors

10
New cards

How to estimate the models with split errors

Pooled OLS

First differences

11
New cards

Pooled OLS with split errors

Assume ai & Uit uncorrelated with xit → can use OLS as Zero conditional mean not broken

Combine error terms back together - ai + Uit = vit E(vit | xit) = 0

If they are correlated then we get biased estimators

12
New cards

First differences model

When we use first differences, ai drops out as its time invariant - same in both periods

knowt flashcard image

13
New cards

First differences and OLS

After differencing, we no longer have 2 sets of restriction to estimate

We can use OLS to get unbiased estimates so long as:

knowt flashcard image

No worse off than before - still have to worry about this

14
New cards

First differences constant / intercept

In the first difference model, we don't have anything that tells you what would be the average level of crime if there was zero unemployment.

Our intercept tells you what was the average change

15
New cards

First differences if xit varies little over time

First differencing eliminates time-invariant error + reduce OVB + increase accuracy of estimates & power

When X varies little then OLS estimators are extremely unstable as they will have huge standard errors - no relationship evident

16
New cards

Downsides of Panel data

Data availability & quality

Econometrically

17
New cards

Downsides of Panel data - Data availability and quality:

  • Panel data is costly to collect

    • tracking people/households/firms for a second/ third/fourth … interview is time-consuming and even more costly when using repeat people

  • sample attrition: some people/ included in earlier rounds are not found in later rounds

    • Problematic if non-random

      • i.e. if the likelihood of finding them is systematically related to something we are interested in, e.g., better performing firms are more likely to be findable second time around

    • Over time, the sample may become less representative

      • Under-samples migrants in countries who recently have high immigration if survey started before high immigration

  • Much less of a problem with administrative data (very popular in modern research)

    • No attrition

18
New cards

Downsides of panel data - econometrically

Differencing does not help if the variables of interest do not change or change very little over time

– If they don’t change at all: you can’t estimate anything

– If they change little: you might estimate something off of noise

The potential for omitted variable bias still exists

19
New cards

Extending the model to include t > 2

Add dummy variable for each year - captures avg change between years in the panel across all respondents

20
New cards

Extending the model - first differences

We can still use first differences when t > 2 as it still eliminates ai

However we need to worry about serial correlation

  • if uit follow AR(1) then change in uit are serially correlated

  • If uit are uncorrelated with constant variance, the correlation between the changes in t is -0.5

We need to adress heteroscedasticity

21
New cards

Fixed effect transformation

One of other way to eliminate time-invariant unobservables

average of ai = ai (constant)

knowt flashcard image

22
New cards

Fixed effects estimator

When we apply fixed effects / within transformation to OLS we get our estimators

We rely on differences within each sample unit (not sample) to idnetify the relationship (variation between yit & xi within i)

we still need E( ̈uit | ̈xit) = 0 across all t to get unbiased

23
New cards

Between estimator

knowt flashcard image

24
New cards

Fixed effects estimator and se(^β1)

To get unbiased estimates we need to adress heteroscedasticity

We also need uit to be serially uncorrelated

25
New cards

Fixed effects estimation - Degrees of freedom

df = NT - N - k

i = 1, 2, … , N t = 1, 2, … , T

-k not (-k -1) as we are estimating a constant

-N as we have to estimate the means

(-N -k) as we are estimating N constants + we lose N when we take an avg

26
New cards

When are Fixed effects and First difference equal

FE = FD when T = 2

knowt flashcard image

27
New cards

Fixed effects and Least squares dummy variable model

FE = LSDV

Within transformation can be obtained by including dummies for all i

<p>FE = LSDV</p><p>Within transformation can be obtained by including dummies for all i</p><p></p>
28
New cards

Fixed effects when there are fixed variables

FE estimation cant estimate fixed variables, or variables with a time trend (becomes spurious for each individual)

We split the data into what varies and what doesn’t

  • if var fixed but with time trend we remove i & t. but add t as part of the coefficient

  • If just fixed then remove t

We include dummies to see effect every year

ai & fixed var drop out when we apply FE transformation

  • fixed with time trend fall out as constant for each individual

29
New cards

Time invariant regressors and stata omitting

Stata omits the time invariant regressors (‘omitted due to collinearity) as it thinks it’s a constant

Not a problem if variable doesn’t vary – but if it varies very little then stata will still give an estimate and it will affect everything else

30
New cards

Time invariant regressors and interaction terms

in order to get round stata omitting the variables that only vary in i, we include an interaction term with the time dummy variable

now it varies in i and in t

stata estimate now shows what the returns are of that var in every year

In the example the estimate got bigger for every year of educ → return to educ increase over time the longer in the L market

31
New cards

Random effects model conditions needed

If ai is uncorrelated with xjit in all time periods

  • then we don’t need to get rid of it to get unbiased estimates → no OVB

32
New cards

RE model and composite errors

We combine ai & uit back together → vit

However the composite error will be serially correlated with i as they all share ai → OLS biased

Therefore we use GLS

33
New cards

composite error

vit = ai + uit

34
New cards

Generalised Least Squares (GLS)

As long as we have large N relative to T we can use GLS - data on lots of units over few years

GLS involves partially demeaning our dependent and explanatory variables & our errors

35
New cards

GLS post partial demeaning

Partially demeaning - takes a proportion of the mean away (proportion = λ)

FE completely demeans

λ equation - measure of relative importance of varianve in idiosyncratic error compared to unobserved fixed effect

<p>Partially demeaning - takes a proportion of the mean away (proportion = λ)</p><p>FE completely demeans</p><p>λ equation - measure of relative importance of varianve in idiosyncratic error compared to unobserved fixed effect</p>
36
New cards

partially demeaned time-invariant explanatory variables

remains in the model (doesn’t = 0)

Can be estimated

To get unbiased estimates we need to assume all explanatory variables are uncorrelated with uit & ai in every t

37
New cards

σa2 & t effects on dataset

As they increase:

  • The more important the variance in time invariant unobservable

    • The larger we want λ to be

  • The closer λ is to 1 - The more of the time invariant part of error we take away

    • to get rid of serial correlation in composite error

38
New cards

λ at the extremes

λ = 1 → FE transformation

λ = 0 → OLS pooled sampled

  • y bar = 0

    • yit - y bar = yit

39
New cards

RE vs FE efficient

RE is more efficient

  • gives better confidence estimates + better inference

Makes better use of the data

  • demeaning in FE loses some variation (more than needed)

However if all xit are not uncorrelated with ai → FE is only choice

40
New cards

Why do we compare RE & FE

By comparing FE & RE we can draw inferences about likely biases

41
New cards

What Hausman test tests for

We can test cov( xij , ai ) = 0 by comparing estimated coefficients on time-varying x in FE & RE model

If ai is uncorrelated with Xs then FE is not efficient as inflated SE

<p>We can test cov( x<sub>ij</sub> , a<sub>i</sub> ) = 0 by comparing estimated coefficients on time-varying x in FE &amp; RE model</p><p>If a<sub>i</sub> is uncorrelated with Xs then FE is not efficient as inflated SE</p>
42
New cards

Hausman test null

H0 : ai uncorrelated with included regressors

  • both FE & RE unbiased + consistent

  • Difference in coefficients not systematic

H1 : ai correlated with included regressors

  • RE inconsistent

43
New cards

When do pooled 1st difference OLS and within estimator yield identical results

When T = 2