Linear Regression: Derivations

Supervised vs. Unsupervised Learning

Supervised Learning:
- Goal: Find a mapping function from inputs (x) to outputs (y).
- Aim to predict or classify outcomes based on input data.
Unsupervised Learning:
- Goal: Discover relationships between features or observations (x) without a predefined output (y).
- Focus on finding patterns, clusters, or reducing dimensionality in the data.

Linear Regression

Objective: Find the best-fit line that minimizes the error between predicted and actual values.
Model Equation:
- $\hat{y} = \beta0 + \beta1x$
- $\beta_0$ : Intercept (where the line crosses the y-axis).
- $\beta_1$ : Slope (the change in y for each unit change in x).
- Goal: Determine the optimal values for $\beta0$ and $\beta1$ .
Finding Beta Zero and Beta One Equations:
- Calculate averages (means) of x and y variables.
- Determine the Pearson correlation between the two variables.
- Compute the sum of deviations.
Pearson Correlation:
- Measures the strength and direction of the linear relationship between two variables.
- Ranges from -1 to 1.
- $r = \frac{{\sum{i=1}^{n} (xi - \bar{x})(yi - \bar{y})}}{{\sqrt{\sum{i=1}^{n} (xi - \bar{x})^2 \sum{i=1}^{n} (y_i - \bar{y})^2}}}$
 - $\bar{x}$ and $\bar{y}$ represent the means of x and y, respectively
- (+) value means positive correlation: if one variable increases, the other tends to increase.
- (-) value means negative correlation: if one variable increases, the other tends to decrease.
Mean (Average):
- $\bar{x} = \frac{{\sum{i=1}^{n} xi}}{{n}}$
Standard Deviation:
- Measures the amount of variation or dispersion in a set of values.
Example Calculation:
- Given a dataset with six data points, calculate the Pearson correlation:
  - Budget: 1.2, 1.5, 2.1, 2.8, 3.2, 3.9
  - Revenue: 2.0, 2.5, 3.0, 3.5, 4.0, 4.5
  - Calculate the mean for both budget and revenue.
  - Subtract each observation from the mean.
  - Multiply the differences for each observation.
  - Sum up the multiplications.
  - Find the square of each difference for x and y.
  - Calculate Pearson Correlation (r). In this example, r = 0.64, which is considered moderately positively correlated.
Calculating Beta One and Beta Zero:
- After calculating the Pearson correlation, calculate the standard deviations.
- Plug into the equations to find $\beta0$ and $\beta1$ .
Example values for $\beta0$ and $\beta1$ : $\beta0$ = 0.9, $\beta1$ = 1.16.
- With these values, we now have a linear regression model.

Model Assessment

Error Assessment: Evaluate the difference between actual data points and predicted values from the model.
Goal of the Model: Minimize the error; the predicted values should closely match the observed values.

Error Measurement

Error: The difference between observed value and predicted value.
- Error = Observed – Predicted
Issue with Summing Errors:
- Positive and negative errors can cancel each other out, leading to a misleadingly low overall error.
- Errors can be high, but due to the cancelling effect their sum is zero.
Mean Squared Error (MSE):
- To address the issue of errors canceling each other out, square the errors.
- MSE = Mean of the sum of the squared errors.
- $MSE = \frac{1}{n} \sum{i=1}^{n} (yi - \hat{y_i})^2$
 - $y_i$ is the actual value.
 - $\hat{y_i}$ is the predicted value.
- The goal is to find $\beta0$ and $\beta1$ that minimize the MSE.
Advantage of Using Squared Errors:
- Squaring makes it easier to perform analytical derivatives.
- The squared equation is convex, which is easier to work with.

Workshop Focus

Learn how to implement linear regression using code.
Go beyond simply using pre-built functions and understand the underlying principles.
Understand the math that goes into creating the model
The reasons as to why we choose certain types of math over others.

Specific Example

Data: Four data points (x, y).
- (1, 6), (2, 5), (2, 7), (4, 10)
Model: $\hat{y} = \beta0 + \beta1x$
Objective: Find $\beta0$ and $\beta1$ that minimize the error.
Minimize sum of squared errors.
Calculus Optimization: Derive partial derivatives of the loss function (sum of squared errors) with respect to $\beta0$ and $\beta1$ .
Set the partial derivatives to zero.
Solve the system of equations to find the values of $\beta0$ and $\beta1$ that minimize the sum of squared errors.
Derivation Example (w.r.t. $\beta_0$ ):
- $8\beta0 + 20\beta1 - 56 = 0$
Solve zero to find the values for $\beta0$ and $\beta1$ : $\beta0$ = 3.5, $\beta1$ = 1.4.

Generalizing Equations

Goal: Find general equations for $\beta0$ and $\beta1$ that work with any dataset.
Representations:
- n data points (n x's, n y's)
- Minimize the sum of the squared difference between predicted and actual values.
- Sum of squared errors: $\sum{i=1}^{n} (yi - (\beta0 + \beta1x_i))^2$
Derivation Process:
- Calculate partial derivatives with respect to $\beta0$ and $\beta1$ .
- Set the derivatives to zero.
- Solve the system of equations.
Calculus Rules:
- Derivative of a sum is the sum of derivatives.
- Apply the chain rule.
Useful Definitions:
- Average: $\bar{x} = \frac{1}{n} \sum{i=1}^{n} xi$
- Identity: $\sum{i=1}^{n} (xi - \bar{x})^2 = \sum{i=1}^{n} xi^2 - n\bar{x}^2$
Solving the System: After a series of derivations, the following equations are obtained:
- $\beta0 = \bar{y} - \beta1\bar{x}$
- $\beta1 = \frac{{\sum{i=1}^{n} (xi - \bar{x})(yi - \bar{y})}}{{\sum{i=1}^{n} (xi - \bar{x})^2}}$

Model Understanding

Variance Explained: Understanding how much of the variance in the dependent variable is explained by the independent variable(s).
Intuition: Compare the model with a simple model that only uses the average of the dependent variable to predict the values.
Unexplained Variation: The difference between the predicted and observed values.
Explained Variation: The amount of variation that the model predicts.
Coefficient of Determination (R-squared):
- A helpful measure of how well the model explains the data.
- $R^2 = 1 - \frac{{SSE}}{{SST}}$
  - SSE (Sum of Squared Errors): Sum of squared differences between predicted and actual values.
  - SST (Total Sum of Squares): Total variation in the data.

Uses for Regression Models

Prediction: Use the mapping function to predict future outcomes.
Interpretation: Find and analyze the omegas to understand relationships between variables.
Prediction vs. Interpretation:
- Prediction: Focus is on accuracy with models that have high predictive power. (good accuracy)
  - Interpretation: Focus is on model coefficients to learn the association direction and size relationship between varibales. (Understand relationships)
Example: Price of housing and features in the house.
- Look at the coefficients to understand what impacts house prices.
- Examples are quality, living areas and size of house.