Notes on Applied Econometrics: Heteroscedasticity and Serial Correlation

After this lecture, you will be able to:

Classify and describe the issue of heteroscedasticity and serial correlation (autocorrelation).
Distinguish the characteristic of time series data with serial correlation.
Identify and discuss the consequences of heteroscedasticity and serial correlation.
Identify and discuss the main idea behind methods of detecting heteroscedasticity through regressions, with an emphasis on Breusch-Pagan test.
Describe the general idea of correcting heteroscedasticity using robust standard errors.
Identify and discuss the main idea behind methods of detecting serial correlation through regressions, with an emphasis on Durbin-Watson and Durbin’s alternative tests.

The classical regression model operates under specific assumptions that help provide desirable properties for OLS estimators.
These assumptions are often violated in real-world data, necessitating the detection and correction of these violations.

Definition: Heteroscedasticity refers to the violation of the homoscedasticity assumption where the variance of the error term is constant across observations.
Mathematically represented as:
$\text{var}(\epsilon|x) = \sigma_i^2$
Variance that changes with the explanatory variables indicates heteroscedastic errors.
Example: Higher education correlates with higher earnings, resulting in varying error terms for different levels of education; higher earnings have larger variances than lower earnings. The fitted line is upward sloping.
Figure 10.1 displays the regression of earnings on education and its residuals.

Unbiasedness: Heteroscedasticity does not affect the unbiasedness of OLS estimators (only influenced by assumptions 1-3).
Variance of Estimates: The non-constant variance invalidates the formula used to calculate the variance of the slope coefficient, which is given by:
$\text{var}(\beta_1) = \frac{\sigma^2}{SS_{T_x}}$
Under heteroscedasticity, the formula for variance becomes:
$\text{var}(\beta_1) = \frac{\sum_{i=1}^n [(x_i - \bar{x})^2 \sigma_i^2]}{(SS_{T_x})^2}$
Implications:
- OLS estimators, although still unbiased, are not the Best Linear Unbiased Estimators (BLUE) due to invalid standard errors, affecting significance tests and the ability to reject null hypotheses correctly.

Definition: Serial correlation occurs when error terms in a model are correlated across time periods, violating the independence assumption of errors.
Common in time series data where the error of one observation may depend on prior observations, leading to dependency among errors.
Example: Test scores influenced by prior knowledge exemplify the correlation of errors over time.
Figure 10.2 illustrates correlated errors vs. non-correlated errors.

Similar to heteroscedasticity, OLS estimates are still unbiased but no longer BLUE.
Invalidates the use of standard statistical tests for significance, leading to inaccurate conclusions about relationships between variables.

Graphical Analysis: Visual inspection of regression residuals can hint at heteroscedasticity. Figure 10.1 highlights a regression model with evident heteroscedasticity.
A formal test involves testing if variance relates to the explanatory variables statistically. This can be mathematically expressed as:
$\text{var}(\epsilon|x) = E[(\epsilon - E(\epsilon))^2|x] = E(\epsilon^2|x)$
Breusch-Pagan Test: A common test for heteroscedasticity given by:
   $e^2 = \delta_0 + \delta_1x + \text{errors}$
  - Null hypothesis: $R^2=0$ (no relationship between variance and x implies homoscedasticity).
  - Rejections indicate heteroscedasticity.
  - Results interpretation from regression on squared residuals depicts heteroscedasticity presence when F-statistic is significant.

Running the regression for earnings and education yields:
$\text{earnings} = -20.93 + 3.85 \text{edu} + \varepsilon$
- $R^2 = 0.3125, n = 500$
Regressing squares of residuals yields:
$\hat{e}^2 = -2033 + 199.7 \text{edu}$
- $R^2 = 0.0317, F = 16.30$
Significant F-statistic results indicate presence of heteroscedasticity.

Stata allows tests using normalized squared residuals against predicted values (Wald statistic), which follows chi-squared distribution.
Application example:
- Test statistic = 468.3, critical value for $ ext{chi}^2(1)$ at 5% level = 3.84, indicates rejection of homoscedasticity.

Model Specification: Adjust functional forms or include omitted variables to address model misspecification.
Use of Robust Standard Errors: Implement heteroscedasticity-robust standard errors (commonly available in econometrics packages) for valid metrics.
- Caveat: Requires a large sample size; unreliable in small samples.
Weighted Least Squares (WLS): Applicable when the relationship between variance and explanatory variables is known. Example provided in the Appendix details the methodology.

Graphical Techniques: Visual observation of residuals over time to identify potential serial correlations.
Formal Testing: Regression of current errors against past errors (e.g., testing the correlation between errors at time t and t-1):
$y_t = \beta_0 + \beta_1 x_t + \epsilon_t$
$\epsilon_t = \rho \epsilon_{t-1} + u_t$
Durbin-Watson Test: A widely used method for testing autocorrelation, obtaining a test statistic approximately equal to $2(1 - \rho)$, with outcomes ranging between
  - Interpretation:
    - Statistic near 2 suggests no serial correlation.
    - Less than 2 indicates positive correlation; greater than 2 indicates negative serial correlation.

Identifying correct models minimizing serial correlation effects by avoiding misspecification.
Utilizing econometrics packages which assert estimates correcting for both heteroscedasticity and serial correlation, such as Newey-West standard errors.
Mathematical adjustments if relationships are known; example shown in the appendix indicates effective application of FGLS (Feasible Generalized Least Squares).