Linear Regression: Derivations
Supervised vs. Unsupervised Learning
Supervised Learning:
- Goal: Find a mapping function from inputs (x) to outputs (y).
- Aim to predict or classify outcomes based on input data.
Unsupervised Learning:
- Goal: Discover relationships between features or observations (x) without a predefined output (y).
- Focus on finding patterns, clusters, or reducing dimensionality in the data.
Linear Regression
Objective: Find the best-fit line that minimizes the error between predicted and actual values.
Model Equation:
- : Intercept (where the line crosses the y-axis).
- : Slope (the change in y for each unit change in x).
- Goal: Determine the optimal values for and .
Finding Beta Zero and Beta One Equations:
- Calculate averages (means) of x and y variables.
- Determine the Pearson correlation between the two variables.
- Compute the sum of deviations.
Pearson Correlation:
- Measures the strength and direction of the linear relationship between two variables.
- Ranges from -1 to 1.
- and represent the means of x and y, respectively
- (+) value means positive correlation: if one variable increases, the other tends to increase.
- (-) value means negative correlation: if one variable increases, the other tends to decrease.
Mean (Average):
Standard Deviation:
- Measures the amount of variation or dispersion in a set of values.
Example Calculation:
- Given a dataset with six data points, calculate the Pearson correlation:
- Budget: 1.2, 1.5, 2.1, 2.8, 3.2, 3.9
- Revenue: 2.0, 2.5, 3.0, 3.5, 4.0, 4.5
- Calculate the mean for both budget and revenue.
- Subtract each observation from the mean.
- Multiply the differences for each observation.
- Sum up the multiplications.
- Find the square of each difference for x and y.
- Calculate Pearson Correlation (r). In this example, r = 0.64, which is considered moderately positively correlated.
- Given a dataset with six data points, calculate the Pearson correlation:
Calculating Beta One and Beta Zero:
- After calculating the Pearson correlation, calculate the standard deviations.
- Plug into the equations to find and .
Example values for and : = 0.9, = 1.16.
- With these values, we now have a linear regression model.
Model Assessment
Error Assessment: Evaluate the difference between actual data points and predicted values from the model.
Goal of the Model: Minimize the error; the predicted values should closely match the observed values.
Error Measurement
Error: The difference between observed value and predicted value.
- Error = Observed – Predicted
Issue with Summing Errors:
- Positive and negative errors can cancel each other out, leading to a misleadingly low overall error.
- Errors can be high, but due to the cancelling effect their sum is zero.
Mean Squared Error (MSE):
- To address the issue of errors canceling each other out, square the errors.
- MSE = Mean of the sum of the squared errors.
- is the actual value.
- is the predicted value.
- The goal is to find and that minimize the MSE.
Advantage of Using Squared Errors:
- Squaring makes it easier to perform analytical derivatives.
- The squared equation is convex, which is easier to work with.
Workshop Focus
- Learn how to implement linear regression using code.
- Go beyond simply using pre-built functions and understand the underlying principles.
- Understand the math that goes into creating the model
- The reasons as to why we choose certain types of math over others.
Specific Example
Data: Four data points (x, y).
- (1, 6), (2, 5), (2, 7), (4, 10)
Model:
Objective: Find and that minimize the error.
Minimize sum of squared errors.
Calculus Optimization: Derive partial derivatives of the loss function (sum of squared errors) with respect to and .
Set the partial derivatives to zero.
Solve the system of equations to find the values of and that minimize the sum of squared errors.
Derivation Example (w.r.t. ):
Solve zero to find the values for and : = 3.5, = 1.4.
Generalizing Equations
Goal: Find general equations for and that work with any dataset.
Representations:
- n data points (n x's, n y's)
- Minimize the sum of the squared difference between predicted and actual values.
- Sum of squared errors:
Derivation Process:
- Calculate partial derivatives with respect to and .
- Set the derivatives to zero.
- Solve the system of equations.
Calculus Rules:
- Derivative of a sum is the sum of derivatives.
- Apply the chain rule.
Useful Definitions:
- Average:
- Identity:
Solving the System: After a series of derivations, the following equations are obtained:
Model Understanding
Variance Explained: Understanding how much of the variance in the dependent variable is explained by the independent variable(s).
Intuition: Compare the model with a simple model that only uses the average of the dependent variable to predict the values.
Unexplained Variation: The difference between the predicted and observed values.
Explained Variation: The amount of variation that the model predicts.
Coefficient of Determination (R-squared):
- A helpful measure of how well the model explains the data.
- SSE (Sum of Squared Errors): Sum of squared differences between predicted and actual values.
- SST (Total Sum of Squares): Total variation in the data.
Uses for Regression Models
Prediction: Use the mapping function to predict future outcomes.
Interpretation: Find and analyze the omegas to understand relationships between variables.
Prediction vs. Interpretation:
- Prediction: Focus is on accuracy with models that have high predictive power. (good accuracy)
- Interpretation: Focus is on model coefficients to learn the association direction and size relationship between varibales. (Understand relationships)
- Prediction: Focus is on accuracy with models that have high predictive power. (good accuracy)
Example: Price of housing and features in the house.
- Look at the coefficients to understand what impacts house prices.
- Examples are quality, living areas and size of house.