IS

Applied Econometrics, L9

Chapter 9: Panel Data

Introduction to Different Data Types

  • Cross-sectional data: Snapshot of data collected at a single point in time for multiple entities.

    • so far we always assumed we were here, just replace n indice with a t

      • Example: (Y_i, X_i), for i = 1, ..., n.

  • Time series data: Observations on variables at T different points in time.

    • Example: (Y_t, X_t), for t = 1, ..., T.

  • Pooled data: Combines multiple sets of n entities observed across T different time periods.

    • Example: Y_i, X_i for i = 1, ..., n.

  • Panel data: Collection of the same n entities observed across T different time periods.

    • Example: (Y_it, X_it), with i = 1, ..., n and t = 1, ..., T.

    • A balanced panel means each entity is observed T times.

    • like a two sided table with individuals & across time

Pooled Model

  • A pooled model initializes a linear regression model with stacked data:

    • Formula: y_i = β_0 + β_1x_1i + ... + β_kx_ki + ε_i, where i = 1, ..., Tn.

  • generate an bunch of random variables (regressors)

  • then do a reg (OLS)

  • now simulate what if that sample was repeated 3 times (expand 3)

    →done through copypasting the same observations. this is the pooled model

  • Causes lower standard errors without additional information—this is artificial.

  • Important considerations:

    • Are errors correlated over time?

    • Is there group-wise heteroskedasticity?

    • How to spot it IRL?

      • wages in several countries over 12 months, and you put just all countries (3)together, it means that we have 300 entities each month

Including Time Dummies in Models

  • not all bad even though they have limitations

  • Time dummies allow for different intercepts across various time periods.

  • but can be included for different slopes,

  • but in general allows us to see how the relationship is impacted over time? if there are things that evolve with time that have an impact on the relationship

  • Formula: y_it = β_1x_1i + ... + β_kx_ki + δ_1T_1 + ... + δ_TT_T + ε_it.

    • T_j is 1 if the observation is at time j, else 0.

  • Cannot identify individual effects with this model.

    • Variables that only vary over time cannot be included to avoid perfect multicollinearity.

  • Simpson's Paradox

  • A trend may be present in separate groups but can disappear when combined.

  • Fixed-effects estimation can address this unobserved heterogeneity.

Fixed Effects Model

  • Panel data control for unobserved variables constant over time across different entities.

  • So when we know there are confounding variables that could cause OVB, but cannot be observed

  • So how can we control for that?

    • entity fixed effect=both correlated with regressor & dependent variable

    • time effect

  • However, no way in the pooled model to control for

  • True model:

    • y_it = α_i + β'x_it + ε_it

  • Can express as:

    • y_it = α_i + β'x_it + ε_it

  • Where α_i are the unobserved effects.

Fixed Effects Estimation

  • Uses Least Squares Dummy Variable approach to estimate fixed effects.

  • Formula:

    • y_it = α*D_ij + β'x_it + ε_it

    • where D_ij = 1 if i = j, else 0.

  • Example: Studying ages and experiences affecting wages.

Relevance of Fixed Effects

  • Individual fixed effects require a large dataset.

  • Fisher Test to test relevance:

    • H0: α_j = α for all j

    • Compare with pooled model: y_it = α + β'x_it + ε_it.

Estimation Techniques

  • Within Estimation (de-meaned):

    • Focus on controlling for entity fixed effects, not for individual interpretation of effects.

    • Variability over time accounts for all time-invariant variables.

    • Ex: wage 100 individuals each month for a year, n=100,T=12

      → wageit -wagei =wagei

  • First Difference Model:

    • Allows elimination of entity fixed effects without bias through differencing.

Random Effects Model

  • Formulated as: y_it = α_i + β'x_it + ε_it

    • where ε_it is assumed to follow a distribution that allows for random effects.

  • Random effects model estimates treat α_i as random with average β_0.

Hausman Test

  • Compares fixed and random effects for efficiency and consistency.

    • H0: β_Random is preferred if consistent; otherwise, β_Fixed is preferred.

Panel Data Example: Fixed Effects

  • Example from National Longitudinal Survey of Young Women:

    • Regression of ln_wage on age, msp, and ttl_exp using fixed effects yields significant results.

  • Observations: 28,494, R-squared values provided.

Panel Data Example: Random Effects

  • Conduct a random effects GLS regression with the same variables, noting R-squared and coefficient significance results.

Model Selection

  • Dougherty (2011) procedure for selecting panel data model:

    • Question whether observations can be treated as a random population sample.

      • Perform both fixed and random effects regressions.

    • Investigate results of relevant statistical tests to choose between fixed, random, or pooled models.

Dynamic Models

  • Include lagged dependent variables in the regression framework.

  • Within estimator relationship with error term leads to bias unless t approaches infinity.

Addressing Bias in Dynamic Models

  • First-difference Estimator: Works to eliminate bias through an appropriate instrument method.

  • Arellano-Bond Estimator: Advances traditional estimators to use moment conditions relevant to biases.

Binary Dependent Variable Issues

  • Fixed-effects for binary variables operate differently, due to complications stemming from requirement of linear models.

Conditional Fixed Effects and Likelihood

  • Analyze probabilities regarding binary outcomes through transformations related to individual effects.

Difference-in-Difference Methodology

  • Evaluate treatment effects through comparisons of treated versus control units before and after intervention.

  • Can be executed via a panel data model, focusing on key differences across treatment groups.

Conclusion

  • Discuss and compare the results of difference-in-difference analysis before and after treatment for specified outcomes.