Course: ECO440/ECO640
Institution: Niagara University
Goal of OLS Regression: Estimating the empirical regression equation.
Mechanics of OLS Estimator: Parameter estimates in both univariate and multivariate regression.
OLS Regression: Intuition and interpretation of results.
Importance of Fit: Understanding the decomposition of variance and R².
Purpose of Regression Analysis: Transition from a theoretical equation to an estimated empirical regression equation.
Key Questions:
What empirical regression line should be used?
How to choose among several alternative versions of the regression equation?
Aim for a regression line that resembles the theoretical model, e.g., a linear function connecting schooling to wages.
The model should provide the best fit for the data, specifically assessing the parameters.
Example: How does an additional year of schooling affect wage levels?
Analyze and assess the fit of candidate regression equations.
Evaluate prediction errors (residuals) using actual vs predicted wage values.
Consider slope implications regarding education's impact on wages.
The best-fitting regression line is derived using OLS methods.
Discuss reasons why this regression equation is favored over others.
OLS as the primary method for obtaining regression estimates.
Goal: Minimize the squared errors (residuals).
Rationale: Minimizing the square of errors rather than their sum improves accuracy.
Estimator: A technique applied to sample data to estimate population regression coefficients.
Estimate: The computed value of a regression coefficient.
Key components: Parameters (unknown betas, β) vs variables (Y, X).
Advantages of OLS:
Ease of use.
Conceptual appeal of minimizing squared errors.
Properties:
Residuals sum to zero.
Under certain conditions, OLS is the BLUE (Best Linear Unbiased Estimator).
OLS minimizes the squared error:
Mathematically, the method involves computing values that minimize the expression ∑(Y - Ŷ)².
Derivation of OLS estimates is based on theoretical equations.
Fundamental equations for computation: (2.1), (2.4), (2.5).
The numerator represents covariance between X and Y, indicating the relationship strength.
The denominator indicates the variance of X, showing data dispersion.
Interpretation: The slope (β₁) indicates how a unit change in X affects Y, weighted by X's variance.
Steps:
Calculate means for X and Y.
Compute residuals and sums of products.
Derive estimates using manual calculations or software.
Data and intermediate calculations are crucial for estimating coefficients.
Illustration using height/weight data to compute regression coefficients.
Multivariate models needed as many Y variables cannot be explained by a single X.
General form of multivariate regression with K independent variables:
Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε.
Coefficients in multivariate models indicate the change in Y with a one-unit increase in X, holding other variables constant.
Example 1: Demand for beef in the U.S.
Consumption relation to income, controlling for price.
Example 2: Financial aid effects based on parents’ contribution and student GPA.
Insights drawn from the coefficients and their implications on aid calculations.
Importance of understanding fit, focusing on how well the model predicts Y based on X.
Total sum of squares (TSS) measures variation in Y.
TSS comprises two components:
Explained sum of squares (ESS): Variation explained by regression.
Residual sum of squares (RSS): Variation not explained.
ESS must represent a large portion of TSS for a successful model fit.
Introduces the coefficient of determination, R².
R² is the ratio of ESS to TSS, indicating how well the regression explains the data.
Values vary between 0 and 1, with higher indicating better fit.
Adding variables can artificially inflate R², potentially without meaningful impact on the model.
New variables require estimation and affect degrees of freedom, which should be considered.
Adjusted R² accounts for degrees of freedom, allowing for more meaningful comparisons when adding variables.
R² is useful for comparing equations with the same dependent variable but not for different ones.
Warning against optimizing R² at the expense of meaningful theory behind model choice.
Example with mozzarella cheese consumption illustrates the danger of adding nonsensical adjustments to the model, leading to misleading conclusions about R².