Simple Linear Regression
Week Two Content Overview
Focus: Using simple regression for predictions (Chapter 4 and 5 of Stockton and Watson textbook).
Big Picture Goals
Understanding how to apply regression to make predictions.
Learning Objectives
Define and understand key concepts:
Conditional mean and population regression function.
Interpret Ordinary Least Squares (OLS) outcomes, including intercept and slope.
Establish causation vs. correlation in regression analysis.
Key Concepts
Conditional Mean
Conditional Mean Function: This represents the expected value of the dependent variable given specific values of the independent variable(s).
Example: In football, if aiming for a field goal, the unconditional mean is to aim for the center of the goal. However, when factoring wind conditions, the aim shifts, demonstrating how conditions affect predictions.
Ordinary Least Squares (OLS)
Ordinary Least Squares (OLS): A method for estimating the parameters of a linear regression model.
Purpose: To minimize the sum of the squares of the residuals (the differences between observed and predicted values).
Interpretation of Coefficients:
Intercept (β0): The expected value of Y when X is 0.
Slope (β1): Indicates how much the value of Y is expected to change for a one-unit change in X. Often discussed as the “slope coefficient.”
Establishing Causation
Important point: Determine whether changes in X cause changes in Y.
Causal relationships often established through statistical testing:
T-statistics: Used to determine if the null hypothesis about the coefficients can be rejected. Tests the significance of predictors in the model.
Model Evaluation
R-Squared (R²): A measure of how well the regression model explains the variation observed in the dependent variable.
It represents how well the independent variable(s) predict the dependent variable. A higher R² indicates a better fit.
Predicted Value: Derived from the regression model based on known X values, representing the expected Y value:
Y{pred} = eta0 + eta_1 imes X
Residuals / Error Term (U): The difference between actual and predicted values. Reflects the unexplained randomness within the data.
Causation vs. Predictions
Focus on establishing direct relationships. Key assumptions include:
Assumption that X has a causal influence on Y.
To isolate the relationship, other factors must be held constant (ceteris paribus).
Regression Basics
Simple Regression Model: Involves one independent variable (X) predicting one dependent variable (Y).
Y (dependent variable): Value being predicted.
X (independent variable): Value used for prediction.
After estimating model parameters:
Derive regression equation:
Y = eta0 + eta1 imes X
Interpretation of Parameters
β0 (Intercept): Expected value of Y when X = 0. Indicates the mean value of the dependent variable at zero levels of other variables.
β1 (Slope): The constant effect of changing X by one unit on Y.
Example Application
If analyzing how hours spent on homework (X) influences GPA (Y):
Create a dataset with observations on both X and Y.
Estimation will yield β0 (intercept) and β1 (slope). Then evaluate whether there is a significant positive correlation between hours spent on homework and GPA.
Expected relationship derived from the model can yield insights into how much increasing homework hours could potentially raise GPA values.
Importance of Error Term (U)
Introduces randomness:
Real-life data always includes factors beyond those considered in the model, leading to uncertainty.
Example from sports predictions portrays this randomness:
When Team A is