Study Notes for ECON0019: Sampling Distributions of OLS Estimators and t-statistics

ECON0019 Sampling Distributions of OLS Estimators and their t-statistics

Lecture Information

  • Instructor: Professor Dennis Kristensen

  • University: University College London (UCL)

  • Date: October 18, 2021

  • Course: ECON0019

Contents Overview

  • The slides cover Sections 4.1–4.2 of Wooldridge's "Introductory Econometrics":

    1. Sampling Distributions of the OLS Estimators

    2. Sampling distribution of t-statistics

Recap of Previous Material

  • The regression model is expressed as:
    y=β<em>0+β</em>1x<em>1++β</em>kxk+uy = \beta<em>0 + \beta</em>1 x<em>1 + … + \beta</em>k x_k + u

  • Key points covered up to this lecture:

    1. Definition and interpretation of the Multiple Linear Regression (MLR) model.

    2. Mechanics of Ordinary Least Squares (OLS) for a given sample.

    3. First two moments of the distribution of the OLS estimators.

    4. MLR assumptions (MLR.1–MLR.5) imply that OLS is the Best Linear Unbiased Estimator (BLUE).

Sampling Distributions of the OLS Estimators

  • Objective: Test hypotheses about the coefficients βj\beta_j.

  • Hypothesis testing involves claiming a population parameter has a certain value and checking data against it.

  • Example: Wage regression: lwage=β<em>0+β</em>1educ+β2IQ+u\text{lwage} = \beta<em>0 + \beta</em>1 \text{educ} + \beta_2 \text{IQ} + u

    • Null hypothesis: Education has no effect on wages, expressed as:
      H<em>0:β</em>1=0H<em>0 : \beta</em>1 = 0

    • Use the estimator β^<em>1\hat{\beta}<em>1 to examine the validity of H</em>0H</em>0.

Key Relationships
  • The relationship between the estimators and population parameters is given by:

    • Expectation: E[β^<em>jX</em>n]=βjE[\hat{\beta}<em>j | X</em>n] = \beta_j

    • Variance: Var(β^<em>jX</em>n)=σ2SST<em>j(1R2</em>j)Var(\hat{\beta}<em>j | X</em>n) = \sigma^2 SST<em>j (1 - R^2</em>j)

  • Hypothesis testing necessitates knowledge about the entire distribution of the estimators β^j\hat{\beta}_j.

Distribution of the Error Term

  • The OLS estimator is expressed as: β^<em>j=β</em>j+<em>i=1nw</em>ijui\hat{\beta}<em>j = \beta</em>j + \sum<em>{i=1}^{n} w</em>{ij} u_i

    • Here, w<em>ijw<em>{ij} are functions of X</em>n=f(x<em>i1,,x</em>ik)X</em>n = f(x<em>{i1}, …, x</em>{ik}), for i=1,,ni = 1, …, n.

  • Conditional on X<em>nX<em>n, the distribution of β^</em>j\hat{\beta}</em>j inherits properties from the distribution of the error term uu.

  • Under MLR.4 and MLR.5, we have:

    • Expectation of errors: E[ux<em>1,,x</em>k]=E[u]=0E[u | x<em>1, …, x</em>k] = E[u] = 0

    • Variance of errors: Var(ux<em>1,,x</em>k)=Var(u)=σ2Var(u | x<em>1, …, x</em>k) = Var(u) = \sigma^2

  • The remaining features of the distribution of uu are unknown, suggesting that the sample distribution of β^j\hat{\beta}_j can be very flexible.

Strengthening Assumptions on Errors

Normality Assumption (MLR.6)
  • MLR.6 states that the population error uu is independent of x<em>1,,x</em>kx<em>1, …, x</em>k and normally distributed with mean zero and variance σ2\sigma^2, denoted as:
    uN(0,σ2)u \sim N(0, \sigma^2)

  • This assumption introduces full independence between the errors and the independent variables, hence the name "independent variables".

  • MLR.6 reinforces MLR.4 and MLR.5, providing a stronger assumption by specifying a distribution for uu—namely, the bell-shaped normal distribution.

Evaluating the Normality Assumption

  • Normality is commonly assumed but can be violated in practical applications.

  • Justification for normality often relies on the central limit theorem, where:

    • If u=f<em>1+f</em>2++fmu = f<em>1 + f</em>2 + … + f_m for a large mm, and if each of the factors follow the same distribution, the sum will approximate normality.

  • Complications arise if the factors have different distributions or dependencies, making MLR.6 a convenience assumption.

  • Statistical inference without MLR.6 is challenging; however, for large samples, this assumption can sometimes be relaxed.

Theorem on Normal Sampling Distributions

  • Under the combined MLR assumptions (MLR.1 to MLR.6):
    β^<em>jN(β</em>j,Var(β^<em>jX</em>n))\hat{\beta}<em>j \sim N(\beta</em>j, Var(\hat{\beta}<em>j | X</em>n) )

  • The standardized random variable is given by:
    β^<em>jβ</em>j sd(β^<em>jX</em>n)N(0,1)\hat{\beta}<em>j - \beta</em>j \text{ sd}(\hat{\beta}<em>j | X</em>n) \sim N(0, 1)

  • It is observed that the standard normal distribution is maintained even without conditioning on XnX_n.

Proof of the Theorem (Part 1)

  • An established fact about independent normal random variables states that any linear combination remains normally distributed:
    c<em>1u</em>1+c<em>2u</em>2N(0,σ2(c<em>12+c</em>22))c<em>1 u</em>1 + c<em>2 u</em>2 \sim N(0, \sigma^2(c<em>1^2 + c</em>2^2))

  • Generalizing this, if c<em>1,,c</em>nc<em>1, …, c</em>n are constants, then:
    <em>i=1nc</em>iu<em>iN(0,σ2</em>i=1nci2)\sum<em>{i=1}^{n} c</em>i u<em>i \sim N(0, \sigma^2 \sum</em>{i=1}^{n} c_i^2)

  • Given that β^<em>j=β</em>j+<em>i=1nw</em>iju<em>i\hat{\beta}<em>j = \beta</em>j + \sum<em>{i=1}^{n} w</em>{ij} u<em>i, we state: </em>i=1nw<em>iju</em>iN(0,σ2<em>i=1nw</em>ij2)\sum</em>{i=1}^{n} w<em>{ij} u</em>i \sim N \left(0, \sigma^2 \sum<em>{i=1}^{n} w</em>{ij}^2 \right)

  • This means:
    β^<em>jN(β</em>j,Var(β^<em>jX</em>n))\hat{\beta}<em>j \sim N(\beta</em>j, Var(\hat{\beta}<em>j | X</em>n) )

Proof of Theorem (Part 2)

  • A key rule states that if a random variable zN(μ,σ2)z \sim N(\mu, \sigma^2), then the standardized variable is formatted as:
    zμσN(0,1)\frac{z - \mu}{\sigma} \sim N(0, 1)

  • Notably, irrespective of the distribution of zz, the standardized variable satisfies:

    • E[z]=0E[z^\prime] = 0

    • Var(z)=1Var(z^\prime) = 1

  • Combining this with the conclusion from the first part of the theorem leads to:
    β^<em>jN(β</em>j,Var(β^<em>jX</em>n)),β^<em>jβ</em>j sd(β^<em>jX</em>n)N(0,1).\hat{\beta}<em>j \sim N(\beta</em>j, Var(\hat{\beta}<em>j | X</em>n) ), \quad \hat{\beta}<em>j - \beta</em>j \text{ sd}(\hat{\beta}<em>j | X</em>n) \sim N(0, 1).

The t-statistic

  • Direct application of the result β^<em>jβ</em>j sd(β^<em>jX</em>n)N(0,1)\hat{\beta}<em>j - \beta</em>j \text{ sd}(\hat{\beta}<em>j | X</em>n) \sim N(0, 1) for hypothesis testing is complicated because sd(β^<em>jX</em>n)sd(\hat{\beta}<em>j | X</em>n) depends on σ=sd(u)\sigma = sd(u), which remains unknown.

  • Instead, the estimator σ^\hat{\sigma} is used in place of σ\sigma, providing the standard error represented as se(β^j)se(\hat{\beta}_j).

  • The resulting t-statistic is defined as:
    t<em>β^</em>j=β^<em>jβ</em>jse(β^j)t<em>{\hat{\beta}</em>j} = \frac{\hat{\beta}<em>j - \beta</em>j}{se(\hat{\beta}_j)}

  • Calculation of this statistic is feasible if the value of βj\beta_j is known.

Theorem on t Distribution for Standardized Estimators

  • Under MLR assumptions (MLR.1–MLR.6):
    t<em>β^</em>j=β^<em>jβ</em>jse(β^<em>j)t</em>nk1t<em>{\hat{\beta}</em>j} = \frac{\hat{\beta}<em>j - \beta</em>j}{se(\hat{\beta}<em>j)} \sim t</em>{n-k-1}

  • The t-distribution is similar to the normal distribution but features a greater spread compared to N(0,1)N(0, 1):

    • Expected value: E(t_{df}) = 0 ext{ if } df > 1

    • Variance: Var(t_{df}) = \frac{df}{df-2} ext{ if } df > 2

  • As degrees of freedom, df=nk1df = n - k - 1, approach infinity, the t-distribution converges to the standard normal distribution: tdfN(0,1)t_{df} \to N(0, 1); negligible differences from normal distribution for df > 120.

Visual Comparison of Distributions

  • Graphical representation comparing the t-distribution with 6 degrees of freedom to the standard normal distribution:

    • Characteristics show greater spread in t-distribution compared to the standard normal.

Practical Application of the t-statistic

  • The t-statistic serves as a significant tool for testing hypotheses concerning regression coefficients.

  • It can evaluate the validity of the null hypothesis regarding partial effects of independent variables:
    H<em>0:β</em>j=0H<em>0 : \beta</em>j = 0

  • The t-statistic is defined as:
    t<em>β^</em>j=β^<em>jse(β^</em>j)t<em>{\hat{\beta}</em>j} = \frac{\hat{\beta}<em>j}{se(\hat{\beta}</em>j)}

  • The metric reflects how far β^<em>j\hat{\beta}<em>j deviates from the hypothesized value of β</em>j\beta</em>j relative to its standard error, providing insight into hypothesis validity.