Applied Econometrics, L9

Chapter 9: Panel Data

Cross-sectional data: Snapshot of data collected at a single point in time for multiple entities.
- so far we always assumed we were here, just replace n indice with a t
  - Example: (Y_i, X_i), for i = 1, ..., n.
Time series data: Observations on variables at T different points in time.
- Example: (Y_t, X_t), for t = 1, ..., T.
Pooled data: Combines multiple sets of n entities observed across T different time periods.
- Example: Y_i, X_i for i = 1, ..., n.
Panel data: Collection of the same n entities observed across T different time periods.
- Example: (Y_it, X_it), with i = 1, ..., n and t = 1, ..., T.
- A balanced panel means each entity is observed T times.
- like a two sided table with individuals & across time

A pooled model initializes a linear regression model with stacked data:
- Formula: y_i = β_0 + β_1x_1i + ... + β_kx_ki + ε_i, where i = 1, ..., Tn.
generate an bunch of random variables (regressors)
then do a reg (OLS)
now simulate what if that sample was repeated 3 times (expand 3)
→done through copypasting the same observations. this is the pooled model
Causes lower standard errors without additional information—this is artificial.
Important considerations:
- Are errors correlated over time?
- Is there group-wise heteroskedasticity?
- How to spot it IRL?
  - wages in several countries over 12 months, and you put just all countries (3)together, it means that we have 300 entities each month

not all bad even though they have limitations
Time dummies allow for different intercepts across various time periods.
but can be included for different slopes,
but in general allows us to see how the relationship is impacted over time? if there are things that evolve with time that have an impact on the relationship
Formula: y_it = β_1x_1i + ... + β_kx_ki + δ_1T_1 + ... + δ_TT_T + ε_it.
- T_j is 1 if the observation is at time j, else 0.
Cannot identify individual effects with this model.
- Variables that only vary over time cannot be included to avoid perfect multicollinearity.
Simpson's Paradox

Panel data control for unobserved variables constant over time across different entities.
So when we know there are confounding variables that could cause OVB, but cannot be observed
So how can we control for that?
- entity fixed effect=both correlated with regressor & dependent variable
- time effect
However, no way in the pooled model to control for
True model:
- y_it = α_i + β'x_it + ε_it
Can express as:
- y_it = α_i + β'x_it + ε_it
Where α_i are the unobserved effects.

Individual fixed effects require a large dataset.
Fisher Test to test relevance:
- H0: α_j = α for all j
- Compare with pooled model: y_it = α + β'x_it + ε_it.

Within Estimation (de-meaned):
- Focus on controlling for entity fixed effects, not for individual interpretation of effects.
- Variability over time accounts for all time-invariant variables.
- Ex: wage 100 individuals each month for a year, n=100,T=12
  → wage_it -wage_i =wage_i
First Difference Model:
- Allows elimination of entity fixed effects without bias through differencing.

Formulated as: y_it = α_i + β'x_it + ε_it
- where ε_it is assumed to follow a distribution that allows for random effects.
Random effects model estimates treat α_i as random with average β_0.

Compares fixed and random effects for efficiency and consistency.
- H0: β_Random is preferred if consistent; otherwise, β_Fixed is preferred.

Example from National Longitudinal Survey of Young Women:
- Regression of ln_wage on age, msp, and ttl_exp using fixed effects yields significant results.
Observations: 28,494, R-squared values provided.

Conduct a random effects GLS regression with the same variables, noting R-squared and coefficient significance results.

Dougherty (2011) procedure for selecting panel data model:
- Question whether observations can be treated as a random population sample.
  - Perform both fixed and random effects regressions.
- Investigate results of relevant statistical tests to choose between fixed, random, or pooled models.

Include lagged dependent variables in the regression framework.
Within estimator relationship with error term leads to bias unless t approaches infinity.

First-difference Estimator: Works to eliminate bias through an appropriate instrument method.
Arellano-Bond Estimator: Advances traditional estimators to use moment conditions relevant to biases.

Fixed-effects for binary variables operate differently, due to complications stemming from requirement of linear models.

Analyze probabilities regarding binary outcomes through transformations related to individual effects.

Evaluate treatment effects through comparisons of treated versus control units before and after intervention.
Can be executed via a panel data model, focusing on key differences across treatment groups.

Discuss and compare the results of difference-in-difference analysis before and after treatment for specified outcomes.