Presenter: Abhimanyu Gupta
Date: November 11, 2024
Sampling Distribution of the OLS Estimator
Normality assumption on the error term
Result on the distribution of the OLS estimators
Hypothesis Testing: t-test
Introduction to hypothesis testing
One-sided alternatives vs Two-sided alternatives
p-values and Confidence Intervals
Reading: Wooldridge (2018), Introductory Econometrics, Chapter 4
To understand how to test hypotheses about the parameters in the population regression model.
Statistical inference (Lecture 2)
Mean and variance of the OLS estimators (Lecture 5)
Determining the sampling distribution of the OLS estimators
Conditional distribution of y given x's determines the distribution of (βˆ0, βˆ1, …, βˆk)
The population error u must:
Be independent of explanatory variables (x1, x2, ..., xk)
Be normally distributed with zero mean and variance σ2: u ∼ Normal(0, σ2)
This assumption is practical but needs to be examined in specific examples.
Expectation: E(u|x1, ..., xk) = E(u) = 0, thus MLR.4 is satisfied.
Variance: Var(u|x1, ..., xk) = Var(u) = σ2, thus MLR.5 is satisfied.
The classical linear model consists of Gauss-Markov assumptions plus the distribution assumption of the error term.
OLS estimators are minimum variance unbiased estimators under these assumptions, leading to stronger efficiency properties than just the Gauss-Markov conditions.
Distribution: y|x1, …, xk ∼ N(β0 + β1x1 + β2x2 + … + βk xk, σ2)
Linear combination of x's as conditional mean
Constant variance σ2
Economic theory may offer limited guidance for normality assumptions.
Examples where normality might fail:
Hourly wages with a minimum wage floor: y ≥ ymin
Number of children born (y ≥ 0)
Empirical investigation of residuals can assess how well normality holds.
Log transformations can help improve normality assumptions:
Example: log(wages) typically exhibits normality, unlike raw wages.
If CLM assumptions (MLR.1-MLR.6) hold, then:
βˆj ∼ Normal(βj, Var(βˆj))
Thus: βˆj - βj / sd(βˆj) ∼ Normal(0, 1)
The conditional mean function E(y|x) shows how the distribution of y surrounds its mean with a normally distributed error term.
Population model: y = β0 + β1x1 + … + βkxk + u
Use OLS estimators to test parameters.
Under CLM assumptions:
βˆj - βj / se(βˆj) ∼ tn−k−1
Requires estimation of σ2, leading to a t-distribution due to additional estimation steps.
Null hypothesis (H0): βj = 0
t-statistic: tβˆj ≡ βˆj / se(βˆj) measures distance from zero.
Significant levels guide rejection of H0 based on generated t-statistic.
Estimate βˆj and se(βˆj)
Compute t-statistic
Determine degrees of freedom
Use t-table for critical value
Apply rejection rule.
Null hypothesis: H0: βj = 0
Rejection rule: H0 is rejected if |tβˆj| > c
Interpretation when H0 is rejected or not.
P(|T| > |t|) provides the p-value to assess significance vs null hypothesis.
p-value indicates probability of observing as extreme statistic under the null hypothesis.
Failure to reject H0 doesn’t mean acceptance.
Distinction between statistical significance (size of tβˆj) vs economic significance (magnitude of βˆj).
95% CI for βj using t-distribution.
Common mistakes include unnecessary use of sample size adjustments in CI formulation.
CI and critical value application to assess statistical significance.
p-values and CI yield insights to reject or fail to reject hypotheses.
Continue with Multiple Regression Analysis: Inference.