Study Notes for ECON0019: Sampling Distributions of OLS Estimators and t-statistics
ECON0019 Sampling Distributions of OLS Estimators and their t-statistics
Lecture Information
Instructor: Professor Dennis Kristensen
University: University College London (UCL)
Date: October 18, 2021
Course: ECON0019
Contents Overview
The slides cover Sections 4.1–4.2 of Wooldridge's "Introductory Econometrics":
Sampling Distributions of the OLS Estimators
Sampling distribution of t-statistics
Recap of Previous Material
The regression model is expressed as:
Key points covered up to this lecture:
Definition and interpretation of the Multiple Linear Regression (MLR) model.
Mechanics of Ordinary Least Squares (OLS) for a given sample.
First two moments of the distribution of the OLS estimators.
MLR assumptions (MLR.1–MLR.5) imply that OLS is the Best Linear Unbiased Estimator (BLUE).
Sampling Distributions of the OLS Estimators
Objective: Test hypotheses about the coefficients .
Hypothesis testing involves claiming a population parameter has a certain value and checking data against it.
Example: Wage regression:
Null hypothesis: Education has no effect on wages, expressed as:
Use the estimator to examine the validity of .
Key Relationships
The relationship between the estimators and population parameters is given by:
Expectation:
Variance:
Hypothesis testing necessitates knowledge about the entire distribution of the estimators .
Distribution of the Error Term
The OLS estimator is expressed as:
Here, are functions of , for .
Conditional on , the distribution of inherits properties from the distribution of the error term .
Under MLR.4 and MLR.5, we have:
Expectation of errors:
Variance of errors:
The remaining features of the distribution of are unknown, suggesting that the sample distribution of can be very flexible.
Strengthening Assumptions on Errors
Normality Assumption (MLR.6)
MLR.6 states that the population error is independent of and normally distributed with mean zero and variance , denoted as:
This assumption introduces full independence between the errors and the independent variables, hence the name "independent variables".
MLR.6 reinforces MLR.4 and MLR.5, providing a stronger assumption by specifying a distribution for —namely, the bell-shaped normal distribution.
Evaluating the Normality Assumption
Normality is commonly assumed but can be violated in practical applications.
Justification for normality often relies on the central limit theorem, where:
If for a large , and if each of the factors follow the same distribution, the sum will approximate normality.
Complications arise if the factors have different distributions or dependencies, making MLR.6 a convenience assumption.
Statistical inference without MLR.6 is challenging; however, for large samples, this assumption can sometimes be relaxed.
Theorem on Normal Sampling Distributions
Under the combined MLR assumptions (MLR.1 to MLR.6):
The standardized random variable is given by:
It is observed that the standard normal distribution is maintained even without conditioning on .
Proof of the Theorem (Part 1)
An established fact about independent normal random variables states that any linear combination remains normally distributed:
Generalizing this, if are constants, then:
Given that , we state:
This means:
Proof of Theorem (Part 2)
A key rule states that if a random variable , then the standardized variable is formatted as:
Notably, irrespective of the distribution of , the standardized variable satisfies:
Combining this with the conclusion from the first part of the theorem leads to:
The t-statistic
Direct application of the result for hypothesis testing is complicated because depends on , which remains unknown.
Instead, the estimator is used in place of , providing the standard error represented as .
The resulting t-statistic is defined as:
Calculation of this statistic is feasible if the value of is known.
Theorem on t Distribution for Standardized Estimators
Under MLR assumptions (MLR.1–MLR.6):
The t-distribution is similar to the normal distribution but features a greater spread compared to :
Expected value: E(t_{df}) = 0 ext{ if } df > 1
Variance: Var(t_{df}) = \frac{df}{df-2} ext{ if } df > 2
As degrees of freedom, , approach infinity, the t-distribution converges to the standard normal distribution: ; negligible differences from normal distribution for df > 120.
Visual Comparison of Distributions
Graphical representation comparing the t-distribution with 6 degrees of freedom to the standard normal distribution:
Characteristics show greater spread in t-distribution compared to the standard normal.
Practical Application of the t-statistic
The t-statistic serves as a significant tool for testing hypotheses concerning regression coefficients.
It can evaluate the validity of the null hypothesis regarding partial effects of independent variables:
The t-statistic is defined as:
The metric reflects how far deviates from the hypothesized value of relative to its standard error, providing insight into hypothesis validity.