Linear Regression: Derivations

Supervised vs. Unsupervised Learning

  • Supervised Learning:

    • Goal: Find a mapping function from inputs (x) to outputs (y).
    • Aim to predict or classify outcomes based on input data.
  • Unsupervised Learning:

    • Goal: Discover relationships between features or observations (x) without a predefined output (y).
    • Focus on finding patterns, clusters, or reducing dimensionality in the data.

Linear Regression

  • Objective: Find the best-fit line that minimizes the error between predicted and actual values.

  • Model Equation:

    • y^=β<em>0+β</em>1x\hat{y} = \beta<em>0 + \beta</em>1x
    • β0\beta_0: Intercept (where the line crosses the y-axis).
    • β1\beta_1: Slope (the change in y for each unit change in x).
    • Goal: Determine the optimal values for β<em>0\beta<em>0 and β</em>1\beta</em>1.
  • Finding Beta Zero and Beta One Equations:

    • Calculate averages (means) of x and y variables.
    • Determine the Pearson correlation between the two variables.
    • Compute the sum of deviations.
  • Pearson Correlation:

    • Measures the strength and direction of the linear relationship between two variables.
    • Ranges from -1 to 1.
    • r=<em>i=1n(x</em>ixˉ)(y<em>iyˉ)</em>i=1n(x<em>ixˉ)2</em>i=1n(yiyˉ)2r = \frac{{\sum<em>{i=1}^{n} (x</em>i - \bar{x})(y<em>i - \bar{y})}}{{\sqrt{\sum</em>{i=1}^{n} (x<em>i - \bar{x})^2 \sum</em>{i=1}^{n} (y_i - \bar{y})^2}}}
      • xˉ\bar{x} and yˉ\bar{y} represent the means of x and y, respectively
    • (+) value means positive correlation: if one variable increases, the other tends to increase.
    • (-) value means negative correlation: if one variable increases, the other tends to decrease.
  • Mean (Average):

    • xˉ=<em>i=1nx</em>in\bar{x} = \frac{{\sum<em>{i=1}^{n} x</em>i}}{{n}}
  • Standard Deviation:

    • Measures the amount of variation or dispersion in a set of values.
  • Example Calculation:

    • Given a dataset with six data points, calculate the Pearson correlation:
      • Budget: 1.2, 1.5, 2.1, 2.8, 3.2, 3.9
      • Revenue: 2.0, 2.5, 3.0, 3.5, 4.0, 4.5
      • Calculate the mean for both budget and revenue.
      • Subtract each observation from the mean.
      • Multiply the differences for each observation.
      • Sum up the multiplications.
      • Find the square of each difference for x and y.
      • Calculate Pearson Correlation (r). In this example, r = 0.64, which is considered moderately positively correlated.
  • Calculating Beta One and Beta Zero:

    • After calculating the Pearson correlation, calculate the standard deviations.
    • Plug into the equations to find β<em>0\beta<em>0 and β</em>1\beta</em>1.
  • Example values for β<em>0\beta<em>0 and β</em>1\beta</em>1: β<em>0\beta<em>0 = 0.9, β</em>1\beta</em>1 = 1.16.

    • With these values, we now have a linear regression model.

Model Assessment

  • Error Assessment: Evaluate the difference between actual data points and predicted values from the model.

  • Goal of the Model: Minimize the error; the predicted values should closely match the observed values.

Error Measurement

  • Error: The difference between observed value and predicted value.

    • Error = Observed – Predicted
  • Issue with Summing Errors:

    • Positive and negative errors can cancel each other out, leading to a misleadingly low overall error.
    • Errors can be high, but due to the cancelling effect their sum is zero.
  • Mean Squared Error (MSE):

    • To address the issue of errors canceling each other out, square the errors.
    • MSE = Mean of the sum of the squared errors.
    • MSE=1n<em>i=1n(y</em>iyi^)2MSE = \frac{1}{n} \sum<em>{i=1}^{n} (y</em>i - \hat{y_i})^2
      • yiy_i is the actual value.
      • yi^\hat{y_i} is the predicted value.
    • The goal is to find β<em>0\beta<em>0 and β</em>1\beta</em>1 that minimize the MSE.
  • Advantage of Using Squared Errors:

    • Squaring makes it easier to perform analytical derivatives.
    • The squared equation is convex, which is easier to work with.

Workshop Focus

  • Learn how to implement linear regression using code.
  • Go beyond simply using pre-built functions and understand the underlying principles.
  • Understand the math that goes into creating the model
  • The reasons as to why we choose certain types of math over others.

Specific Example

  • Data: Four data points (x, y).

    • (1, 6), (2, 5), (2, 7), (4, 10)
  • Model: y^=β<em>0+β</em>1x\hat{y} = \beta<em>0 + \beta</em>1x

  • Objective: Find β<em>0\beta<em>0 and β</em>1\beta</em>1 that minimize the error.

  • Minimize sum of squared errors.

  • Calculus Optimization: Derive partial derivatives of the loss function (sum of squared errors) with respect to β<em>0\beta<em>0 and β</em>1\beta</em>1.

  • Set the partial derivatives to zero.

  • Solve the system of equations to find the values of β<em>0\beta<em>0 and β</em>1\beta</em>1 that minimize the sum of squared errors.

  • Derivation Example (w.r.t. β0\beta_0):

    • 8β<em>0+20β</em>156=08\beta<em>0 + 20\beta</em>1 - 56 = 0
  • Solve zero to find the values for β<em>0\beta<em>0 and β</em>1\beta</em>1: β<em>0\beta<em>0 = 3.5, β</em>1\beta</em>1 = 1.4.

Generalizing Equations

  • Goal: Find general equations for β<em>0\beta<em>0 and β</em>1\beta</em>1 that work with any dataset.

  • Representations:

    • n data points (n x's, n y's)
    • Minimize the sum of the squared difference between predicted and actual values.
    • Sum of squared errors: <em>i=1n(y</em>i(β<em>0+β</em>1xi))2\sum<em>{i=1}^{n} (y</em>i - (\beta<em>0 + \beta</em>1x_i))^2
  • Derivation Process:

    • Calculate partial derivatives with respect to β<em>0\beta<em>0 and β</em>1\beta</em>1.
    • Set the derivatives to zero.
    • Solve the system of equations.
  • Calculus Rules:

    • Derivative of a sum is the sum of derivatives.
    • Apply the chain rule.
  • Useful Definitions:

    • Average: xˉ=1n<em>i=1nx</em>i\bar{x} = \frac{1}{n} \sum<em>{i=1}^{n} x</em>i
    • Identity: <em>i=1n(x</em>ixˉ)2=<em>i=1nx</em>i2nxˉ2\sum<em>{i=1}^{n} (x</em>i - \bar{x})^2 = \sum<em>{i=1}^{n} x</em>i^2 - n\bar{x}^2
  • Solving the System: After a series of derivations, the following equations are obtained:

    • β<em>0=yˉβ</em>1xˉ\beta<em>0 = \bar{y} - \beta</em>1\bar{x}
    • β<em>1=</em>i=1n(x<em>ixˉ)(y</em>iyˉ)<em>i=1n(x</em>ixˉ)2\beta<em>1 = \frac{{\sum</em>{i=1}^{n} (x<em>i - \bar{x})(y</em>i - \bar{y})}}{{\sum<em>{i=1}^{n} (x</em>i - \bar{x})^2}}

Model Understanding

  • Variance Explained: Understanding how much of the variance in the dependent variable is explained by the independent variable(s).

  • Intuition: Compare the model with a simple model that only uses the average of the dependent variable to predict the values.

  • Unexplained Variation: The difference between the predicted and observed values.

  • Explained Variation: The amount of variation that the model predicts.

  • Coefficient of Determination (R-squared):

    • A helpful measure of how well the model explains the data.
    • R2=1SSESSTR^2 = 1 - \frac{{SSE}}{{SST}}
      • SSE (Sum of Squared Errors): Sum of squared differences between predicted and actual values.
      • SST (Total Sum of Squares): Total variation in the data.

Uses for Regression Models

  • Prediction: Use the mapping function to predict future outcomes.

  • Interpretation: Find and analyze the omegas to understand relationships between variables.

  • Prediction vs. Interpretation:

    • Prediction: Focus is on accuracy with models that have high predictive power. (good accuracy)
      • Interpretation: Focus is on model coefficients to learn the association direction and size relationship between varibales. (Understand relationships)
  • Example: Price of housing and features in the house.

    • Look at the coefficients to understand what impacts house prices.
    • Examples are quality, living areas and size of house.