Study Notes for ECON0019: Sampling Distributions of OLS Estimators and t-statistics

The slides cover Sections 4.1–4.2 of Wooldridge's "Introductory Econometrics":
1. Sampling Distributions of the OLS Estimators
2. Sampling distribution of t-statistics

The regression model is expressed as:
$y = \beta0 + \beta1 x1 + … + \betak x_k + u$
Key points covered up to this lecture:
1. Definition and interpretation of the Multiple Linear Regression (MLR) model.
2. Mechanics of Ordinary Least Squares (OLS) for a given sample.
3. First two moments of the distribution of the OLS estimators.
4. MLR assumptions (MLR.1–MLR.5) imply that OLS is the Best Linear Unbiased Estimator (BLUE).

Objective: Test hypotheses about the coefficients $\beta_j$ .
Hypothesis testing involves claiming a population parameter has a certain value and checking data against it.
Example: Wage regression: $\text{lwage} = \beta0 + \beta1 \text{educ} + \beta_2 \text{IQ} + u$
- Null hypothesis: Education has no effect on wages, expressed as:
 $H0 : \beta1 = 0$
- Use the estimator $\hat{\beta}1$ to examine the validity of $H0$ .

The relationship between the estimators and population parameters is given by:
- Expectation: $E[\hat{\beta}j | Xn] = \beta_j$
- Variance: $Var(\hat{\beta}j | Xn) = \sigma^2 SSTj (1 - R^2j)$
Hypothesis testing necessitates knowledge about the entire distribution of the estimators $\hat{\beta}_j$ .

The OLS estimator is expressed as: $\hat{\beta}j = \betaj + \sum{i=1}^{n} w{ij} u_i$
- Here, $w{ij}$ are functions of $Xn = f(x{i1}, …, x{ik})$ , for $i = 1, …, n$ .
Conditional on $Xn$ , the distribution of $\hat{\beta}j$ inherits properties from the distribution of the error term $u$ .
Under MLR.4 and MLR.5, we have:
- Expectation of errors: $E[u | x1, …, xk] = E[u] = 0$
- Variance of errors: $Var(u | x1, …, xk) = Var(u) = \sigma^2$
The remaining features of the distribution of $u$ are unknown, suggesting that the sample distribution of $\hat{\beta}_j$ can be very flexible.

MLR.6 states that the population error $u$ is independent of $x1, …, xk$ and normally distributed with mean zero and variance $\sigma^2$ , denoted as:
$u \sim N(0, \sigma^2)$
This assumption introduces full independence between the errors and the independent variables, hence the name "independent variables".
MLR.6 reinforces MLR.4 and MLR.5, providing a stronger assumption by specifying a distribution for $u$ —namely, the bell-shaped normal distribution.

Normality is commonly assumed but can be violated in practical applications.
Justification for normality often relies on the central limit theorem, where:
- If $u = f1 + f2 + … + f_m$ for a large $m$ , and if each of the factors follow the same distribution, the sum will approximate normality.
Complications arise if the factors have different distributions or dependencies, making MLR.6 a convenience assumption.
Statistical inference without MLR.6 is challenging; however, for large samples, this assumption can sometimes be relaxed.

Under the combined MLR assumptions (MLR.1 to MLR.6):
$\hat{\beta}j \sim N(\betaj, Var(\hat{\beta}j | Xn) )$
The standardized random variable is given by:
$\hat{\beta}j - \betaj \text{ sd}(\hat{\beta}j | Xn) \sim N(0, 1)$
It is observed that the standard normal distribution is maintained even without conditioning on $X_n$ .

An established fact about independent normal random variables states that any linear combination remains normally distributed:
$c1 u1 + c2 u2 \sim N(0, \sigma^2(c1^2 + c2^2))$
Generalizing this, if $c1, …, cn$ are constants, then:
$\sum{i=1}^{n} ci ui \sim N(0, \sigma^2 \sum{i=1}^{n} c_i^2)$
Given that $\hat{\beta}j = \betaj + \sum{i=1}^{n} w{ij} ui$ , we state: $\sum{i=1}^{n} w{ij} ui \sim N \left(0, \sigma^2 \sum{i=1}^{n} w{ij}^2 \right)$
This means:
$\hat{\beta}j \sim N(\betaj, Var(\hat{\beta}j | Xn) )$

A key rule states that if a random variable $z \sim N(\mu, \sigma^2)$ , then the standardized variable is formatted as:
$\frac{z - \mu}{\sigma} \sim N(0, 1)$
Notably, irrespective of the distribution of $z$ , the standardized variable satisfies:
- $E[z^\prime] = 0$
- $Var(z^\prime) = 1$
Combining this with the conclusion from the first part of the theorem leads to:
$\hat{\beta}j \sim N(\betaj, Var(\hat{\beta}j | Xn) ), \quad \hat{\beta}j - \betaj \text{ sd}(\hat{\beta}j | Xn) \sim N(0, 1).$

Direct application of the result $\hat{\beta}j - \betaj \text{ sd}(\hat{\beta}j | Xn) \sim N(0, 1)$ for hypothesis testing is complicated because $sd(\hat{\beta}j | Xn)$ depends on $\sigma = sd(u)$ , which remains unknown.
Instead, the estimator $\hat{\sigma}$ is used in place of $\sigma$ , providing the standard error represented as $se(\hat{\beta}_j)$ .
The resulting t-statistic is defined as:
$t{\hat{\beta}j} = \frac{\hat{\beta}j - \betaj}{se(\hat{\beta}_j)}$
Calculation of this statistic is feasible if the value of $\beta_j$ is known.

Under MLR assumptions (MLR.1–MLR.6):
$t{\hat{\beta}j} = \frac{\hat{\beta}j - \betaj}{se(\hat{\beta}j)} \sim t{n-k-1}$
The t-distribution is similar to the normal distribution but features a greater spread compared to $N(0, 1)$ :
- Expected value: E(t_{df}) = 0 ext{ if } df > 1
- Variance: Var(t_{df}) = \frac{df}{df-2} ext{ if } df > 2
As degrees of freedom, $df = n - k - 1$ , approach infinity, the t-distribution converges to the standard normal distribution: $t_{df} \to N(0, 1)$ ; negligible differences from normal distribution for df > 120.

Graphical representation comparing the t-distribution with 6 degrees of freedom to the standard normal distribution:
- Characteristics show greater spread in t-distribution compared to the standard normal.

The t-statistic serves as a significant tool for testing hypotheses concerning regression coefficients.
It can evaluate the validity of the null hypothesis regarding partial effects of independent variables:
$H0 : \betaj = 0$
The t-statistic is defined as:
$t{\hat{\beta}j} = \frac{\hat{\beta}j}{se(\hat{\beta}j)}$
The metric reflects how far $\hat{\beta}j$ deviates from the hypothesized value of $\betaj$ relative to its standard error, providing insight into hypothesis validity.