Applied Econometrics, L9
Chapter 9: Panel Data
Introduction to Different Data Types
Cross-sectional data: Snapshot of data collected at a single point in time for multiple entities.
so far we always assumed we were here, just replace n indice with a t
Example: (Y_i, X_i), for i = 1, ..., n.
Time series data: Observations on variables at T different points in time.
Example: (Y_t, X_t), for t = 1, ..., T.
Pooled data: Combines multiple sets of n entities observed across T different time periods.
Example: Y_i, X_i for i = 1, ..., n.
Panel data: Collection of the same n entities observed across T different time periods.
Example: (Y_it, X_it), with i = 1, ..., n and t = 1, ..., T.
A balanced panel means each entity is observed T times.
like a two sided table with individuals & across time
Pooled Model
A pooled model initializes a linear regression model with stacked data:
Formula: y_i = β_0 + β_1x_1i + ... + β_kx_ki + ε_i, where i = 1, ..., Tn.
generate an bunch of random variables (regressors)
then do a reg (OLS)
now simulate what if that sample was repeated 3 times (expand 3)
→done through copypasting the same observations. this is the pooled model
Causes lower standard errors without additional information—this is artificial.
Important considerations:
Are errors correlated over time?
Is there group-wise heteroskedasticity?
How to spot it IRL?
wages in several countries over 12 months, and you put just all countries (3)together, it means that we have 300 entities each month
Including Time Dummies in Models
not all bad even though they have limitations
Time dummies allow for different intercepts across various time periods.
but can be included for different slopes,
but in general allows us to see how the relationship is impacted over time? if there are things that evolve with time that have an impact on the relationship
Formula: y_it = β_1x_1i + ... + β_kx_ki + δ_1T_1 + ... + δ_TT_T + ε_it.
T_j is 1 if the observation is at time j, else 0.
Cannot identify individual effects with this model.
Variables that only vary over time cannot be included to avoid perfect multicollinearity.
Simpson's Paradox
A trend may be present in separate groups but can disappear when combined.
Fixed-effects estimation can address this unobserved heterogeneity.
Fixed Effects Model
Panel data control for unobserved variables constant over time across different entities.
So when we know there are confounding variables that could cause OVB, but cannot be observed
So how can we control for that?
entity fixed effect=both correlated with regressor & dependent variable
time effect
However, no way in the pooled model to control for
True model:
y_it = α_i + β'x_it + ε_it
Can express as:
y_it = α_i + β'x_it + ε_it
Where α_i are the unobserved effects.
Fixed Effects Estimation
Uses Least Squares Dummy Variable approach to estimate fixed effects.
Formula:
y_it = α*D_ij + β'x_it + ε_it
where D_ij = 1 if i = j, else 0.
Example: Studying ages and experiences affecting wages.
Relevance of Fixed Effects
Individual fixed effects require a large dataset.
Fisher Test to test relevance:
H0: α_j = α for all j
Compare with pooled model: y_it = α + β'x_it + ε_it.
Estimation Techniques
Within Estimation (de-meaned):
Focus on controlling for entity fixed effects, not for individual interpretation of effects.
Variability over time accounts for all time-invariant variables.
Ex: wage 100 individuals each month for a year, n=100,T=12
→ wageit -wagei =wagei
First Difference Model:
Allows elimination of entity fixed effects without bias through differencing.
Random Effects Model
Formulated as: y_it = α_i + β'x_it + ε_it
where ε_it is assumed to follow a distribution that allows for random effects.
Random effects model estimates treat α_i as random with average β_0.
Hausman Test
Compares fixed and random effects for efficiency and consistency.
H0: β_Random is preferred if consistent; otherwise, β_Fixed is preferred.
Panel Data Example: Fixed Effects
Example from National Longitudinal Survey of Young Women:
Regression of ln_wage on age, msp, and ttl_exp using fixed effects yields significant results.
Observations: 28,494, R-squared values provided.
Panel Data Example: Random Effects
Conduct a random effects GLS regression with the same variables, noting R-squared and coefficient significance results.
Model Selection
Dougherty (2011) procedure for selecting panel data model:
Question whether observations can be treated as a random population sample.
Perform both fixed and random effects regressions.
Investigate results of relevant statistical tests to choose between fixed, random, or pooled models.
Dynamic Models
Include lagged dependent variables in the regression framework.
Within estimator relationship with error term leads to bias unless t approaches infinity.
Addressing Bias in Dynamic Models
First-difference Estimator: Works to eliminate bias through an appropriate instrument method.
Arellano-Bond Estimator: Advances traditional estimators to use moment conditions relevant to biases.
Binary Dependent Variable Issues
Fixed-effects for binary variables operate differently, due to complications stemming from requirement of linear models.
Conditional Fixed Effects and Likelihood
Analyze probabilities regarding binary outcomes through transformations related to individual effects.
Difference-in-Difference Methodology
Evaluate treatment effects through comparisons of treated versus control units before and after intervention.
Can be executed via a panel data model, focusing on key differences across treatment groups.
Conclusion
Discuss and compare the results of difference-in-difference analysis before and after treatment for specified outcomes.