Notes on Heteroskedasticity in OLS Regression
Introduction to Heteroskedasticity
Definition: Heteroskedasticity occurs when the variance of the residuals (error terms) in a regression model is not constant across all levels of an independent variable.
For example, in a model predicting household income based on education level, the variance of the residuals may increase as education level increases, indicating that the predictions for higher education levels are less precise.
Key Concepts
Ordinary Least Squares (OLS): A method used for estimating the unknown parameters in a linear regression model. For OLS to be efficient:
The population residual (error term) must exhibit constant variance, known as Homoskedasticity.
If the variance varies at different points, it is termed Heteroskedasticity.
An example of this might be a regression analysis of prices of houses based on their size and location, where larger houses may have a wider range of prices compared to smaller houses.
Implications of Heteroskedasticity
Effect on OLS: Heteroskedasticity reduces the efficiency of OLS estimates; thus:
OLS is no longer the Best Linear Unbiased Estimator (BLUE). For instance, if a model predicting sales revenue shows heteroskedasticity, the confidence intervals around the revenue estimates may be misleading.
Although OLS coefficients may still be unbiased, their standard errors will be incorrect, leading to unreliable hypothesis tests.
Tests affected by Heteroskedasticity:
Hypothesis Tests (F-test, t-test) become unreliable due to incorrect standard errors. For example, conclusions drawn from a t-test on an OLS regression coefficient may not be trustworthy if heteroskedasticity is present.
Confidence intervals calculated from the models will also be incorrect. This can lead to overestimating or underestimating the significance of predictors in the model.
Identifying Homoskedasticity vs. Heteroskedasticity
Homoskedasticity Characteristics:
Spread of residuals does not change with the value of the independent variables. An example would be a model predicting student test scores where variance remains consistent across all levels of hours studied.
Heteroskedasticity Characteristics:
Spread of residuals changes depending on independent variable values. For example, in a model predicting expenditure based on income, higher incomes tend to show a wider spread of expenditure values, indicating heteroskedasticity.
Residual plots will show variation in residual spread based on independent variable levels. If a plot exhibits a funnel shape, this may indicate heteroskedasticity.
Visual Evidence
Residue Plots: An informal method to detect heteroskedasticity includes plotting residuals:
A scatterplot of residuals can provide visual evidence of variance patterns. If the residuals fan out as the value of the independent variable increases, it suggests heteroskedasticity.
Residual plots should ideally show a random pattern if homoskedasticity is present. If the pattern is non-random, further investigation is necessary, as it may indicate heteroskedasticity.
Solutions for Heteroskedasticity
Adjusting Weights: Unlike with homoskedastic models:
In heteroskedastic models, give higher weights to observations with lower variances and lower weights to those with higher variances. An example approach is using Weighted Least Squares (WLS) regression, which gives more influence to lower-variance observations.
Various formal tests exist to quantitatively assess the presence of heteroskedasticity, but visual methods like residual plots are often quicker and more intuitive for initial assessments.