ECMT2150 Lecture 2
Deriving the OLS Estimator
Introduction to the OLS Estimator
The Ordinary Least Squares (OLS) estimator is a fundamental tool in econometrics for estimating the relationship between variables.
Fitted Values: The predicted values from the regression model, defined mathematically.
Residuals: The difference between observed values and fitted values.
Mathematical Notation
The notation for the residual for observation $i$ is represented as: [ u_i = y_i - \hat{y_i} = y_i - (\hat{\beta_0} + \hat{\beta_1} x_i) ]
Minimization Process
The goal of OLS is to choose parameters ( \hat{\beta_0} ) and ( \hat{\beta_1} ) that minimize the sum of squared residuals.
Graphical Interpretation: Helps in understanding the geometric meaning of minimizing the distance from the regression line.
Sample Regression Function
Definition
The estimated regression line or sample regression function (SRF) is the observed relationship derived from the sample data: [ E(y | x) = \hat{\beta_0} + \hat{\beta_1} x ]
Where ( , \hat{y} , ) signifies the sample estimate of ( \mathbb{E} y | x ).
Importance of SRF
Serves as the sample equivalent to the population regression function (PRF).
Important for evaluating how explanatory variables influence the dependent variable.
Incorporating Non-Linearities
Introduction to Non-Linearities
It is essential to consider non-linear relationships to obtain more accurate and meaningful models.
The course discusses different forms of incorporating non-linearities into MLR, including quadratic terms and interaction terms.
Semi-Logarithmic Forms
Regression of Log Wages on Years of Education:
This formulation allows interpretation in terms of percentage changes.
Example: A 1-year increase in education is associated with an 8.3% increase in wages.
Log-Log Forms
Used when analyzing relationships such as CEO salary and firm sales.
Implicates a constant elasticity model where interpreting coefficients will account for percentage changes.
Properties of the OLS Estimator
Algebraic Properties
The OLS estimators exhibit several key algebraic properties applicable across both simple and multiple regression contexts.
The sample average of the residuals should theoretically equal zero, showing predictions center around actual values.
Another important property is that the covariances between each independent variable and the OLS residuals are also zero.
Fit of the Model
Goodness-of-Fit testing is crucial to determine how well the model explains the variance in the dependent variable.
Metrics include Total Sum of Squares, Explained Sum of Squares, and Residual Sum of Squares.
R-squared: Measures the fraction of the variance explained by the predictors, but caution is required as a high R-squared does not guarantee causality.
Statistical Properties of OLS
Key Concepts
OLS coefficients are considered random variables as they derive from random samples.
Unbiasedness: An estimator is unbiased if its expected value equals the true population value.
Efficiency: OLS aims for the lowest variance among unbiased estimators (BLUE) under the Gauss-Markov theorem, highlighting the efficiency of OLS under specific conditions.
Key Assumptions
Assumption MLR.1 (Linear in Parameters): The relationship must be linear in parameters, not necessarily in independent variables.
Assumption MLR.2 (Random Sampling): Data must be randomly drawn to meet the necessary conditions for inference.
Assumption MLR.3 (No Perfect Collinearity): Independent variables must vary and should not have perfect linear relationships among them.
Assumption MLR.4 (Zero Conditional Mean): Explanatory variables must not convey information about the mean of the error term.
Assumption MLR.5 (Homoskedasticity): You must meet conditions for the variability of unobserved factors to remain constant.
Conclusion
Summarized results reinforce that adherence to these assumptions underpins statistical validity in models and enhances estimation reliability.
A practical implication is the necessity of understanding sampling variability and its impact on inference in econometric analysis.