Notes on Applied Econometrics: Heteroscedasticity and Serial Correlation

Applied Econometrics Notes

Lecture 10: Heteroscedasticity and Serial Correlation

Instructor: Hossein Abbasi, Department of Economics, University of Maryland, College Park
Copyright © 2016 by Hossein Abbasi (Do not use without written permission)

Learning Outcomes

After this lecture, you will be able to:

  • Classify and describe the issue of heteroscedasticity and serial correlation (autocorrelation).

  • Distinguish the characteristic of time series data with serial correlation.

  • Identify and discuss the consequences of heteroscedasticity and serial correlation.

  • Identify and discuss the main idea behind methods of detecting heteroscedasticity through regressions, with an emphasis on Breusch-Pagan test.

  • Describe the general idea of correcting heteroscedasticity using robust standard errors.

  • Identify and discuss the main idea behind methods of detecting serial correlation through regressions, with an emphasis on Durbin-Watson and Durbin’s alternative tests.


10.1 Violation of Classical Assumptions

Overview
  • The classical regression model operates under specific assumptions that help provide desirable properties for OLS estimators.

  • These assumptions are often violated in real-world data, necessitating the detection and correction of these violations.

Heteroscedasticity
  • Definition: Heteroscedasticity refers to the violation of the homoscedasticity assumption where the variance of the error term is constant across observations.

  • Mathematically represented as:
      var(ϵx)=σi2\text{var}(\epsilon|x) = \sigma_i^2

  • Variance that changes with the explanatory variables indicates heteroscedastic errors.

  • Example: Higher education correlates with higher earnings, resulting in varying error terms for different levels of education; higher earnings have larger variances than lower earnings. The fitted line is upward sloping.

  • Figure 10.1 displays the regression of earnings on education and its residuals.

Effects of Heteroscedasticity
  • Unbiasedness: Heteroscedasticity does not affect the unbiasedness of OLS estimators (only influenced by assumptions 1-3).

  • Variance of Estimates: The non-constant variance invalidates the formula used to calculate the variance of the slope coefficient, which is given by:
      var(β1)=σ2SSTx\text{var}(\beta_1) = \frac{\sigma^2}{SS_{T_x}}

  • Under heteroscedasticity, the formula for variance becomes:
      var(β1)=i=1n[(xixˉ)2σi2](SSTx)2\text{var}(\beta_1) = \frac{\sum_{i=1}^n [(x_i - \bar{x})^2 \sigma_i^2]}{(SS_{T_x})^2}

  • Implications:
      - OLS estimators, although still unbiased, are not the Best Linear Unbiased Estimators (BLUE) due to invalid standard errors, affecting significance tests and the ability to reject null hypotheses correctly.

Serial Correlation (Autocorrelation)
  • Definition: Serial correlation occurs when error terms in a model are correlated across time periods, violating the independence assumption of errors.

  • Common in time series data where the error of one observation may depend on prior observations, leading to dependency among errors.

  • Example: Test scores influenced by prior knowledge exemplify the correlation of errors over time.

  • Figure 10.2 illustrates correlated errors vs. non-correlated errors.

Effects of Serial Correlation
  • Similar to heteroscedasticity, OLS estimates are still unbiased but no longer BLUE.

  • Invalidates the use of standard statistical tests for significance, leading to inaccurate conclusions about relationships between variables.


10.2 Heteroscedasticity: Detection

Detection Approaches
  • Graphical Analysis: Visual inspection of regression residuals can hint at heteroscedasticity. Figure 10.1 highlights a regression model with evident heteroscedasticity.

  • A formal test involves testing if variance relates to the explanatory variables statistically. This can be mathematically expressed as:
      var(ϵx)=E[(ϵE(ϵ))2x]=E(ϵ2x)\text{var}(\epsilon|x) = E[(\epsilon - E(\epsilon))^2|x] = E(\epsilon^2|x)

  • Breusch-Pagan Test: A common test for heteroscedasticity given by:
      e2=δ0+δ1x+errorse^2 = \delta_0 + \delta_1x + \text{errors}
      - Null hypothesis: $R^2=0$ (no relationship between variance and x implies homoscedasticity).
      - Rejections indicate heteroscedasticity.
      - Results interpretation from regression on squared residuals depicts heteroscedasticity presence when F-statistic is significant.

Example Results
  • Running the regression for earnings and education yields:
      earnings=20.93+3.85edu+ε\text{earnings} = -20.93 + 3.85 \text{edu} + \varepsilon
      - $R^2 = 0.3125, n = 500$

  • Regressing squares of residuals yields:
      e^2=2033+199.7edu\hat{e}^2 = -2033 + 199.7 \text{edu}
      - $R^2 = 0.0317, F = 16.30$

  • Significant F-statistic results indicate presence of heteroscedasticity.

Breusch-Pagan Test Using Wald Statistic
  • Stata allows tests using normalized squared residuals against predicted values (Wald statistic), which follows chi-squared distribution.

  • Application example:
      - Test statistic = 468.3, critical value for $ ext{chi}^2(1)$ at 5% level = 3.84, indicates rejection of homoscedasticity.


10.3 Issue of Heteroscedasticity: Corrections

Methods of Correction
  • Model Specification: Adjust functional forms or include omitted variables to address model misspecification.

  • Use of Robust Standard Errors: Implement heteroscedasticity-robust standard errors (commonly available in econometrics packages) for valid metrics.
      - Caveat: Requires a large sample size; unreliable in small samples.

  • Weighted Least Squares (WLS): Applicable when the relationship between variance and explanatory variables is known. Example provided in the Appendix details the methodology.


10.4 Issue of Serial Correlation: Detection

Detection Techniques
  • Graphical Techniques: Visual observation of residuals over time to identify potential serial correlations.

  • Formal Testing: Regression of current errors against past errors (e.g., testing the correlation between errors at time t and t-1):
      yt=β0+β1xt+ϵty_t = \beta_0 + \beta_1 x_t + \epsilon_t
      ϵt=ρϵt1+ut\epsilon_t = \rho \epsilon_{t-1} + u_t

  • Durbin-Watson Test: A widely used method for testing autocorrelation, obtaining a test statistic approximately equal to $2(1 - \rho)$, with outcomes ranging between
      - Interpretation:
        - Statistic near 2 suggests no serial correlation.
        - Less than 2 indicates positive correlation; greater than 2 indicates negative serial correlation.


10.5 Issue of Serial Correlation: Correction

Correction Steps
  • Identifying correct models minimizing serial correlation effects by avoiding misspecification.

  • Utilizing econometrics packages which assert estimates correcting for both heteroscedasticity and serial correlation, such as Newey-West standard errors.

  • Mathematical adjustments if relationships are known; example shown in the appendix indicates effective application of FGLS (Feasible Generalized Least Squares).