Linear Regression Notes

Linear Regression Overview

  • Definition: Linear regression is a statistical method to model the relationship between a dependent variable (e.g., mouse size) and one or more independent variables (e.g., mouse weight).

  • Alternative Name: General Linear Models, part one.

  • Importance: Linear regression is a powerful and widely-used technique in statistics and data analysis.

Key Concepts in Linear Regression

  1. Least Squares Method

    • Purpose: To fit a line to the data.

    • Process:

      • Calculate the distance from the line to the data points, known as residuals.

      • Square each residual and sum them up to obtain the total sum of squares of residuals.

      • We square residuals instead of taking absolute values because:

        1. Squared errors are differentiable, giving a simple closed-form solution.

        2. Squaring penalizes large mistakes more, which is usually desirable.

        3. Squared error corresponds to the normal distribution, the foundation for classical regression inference.

        • Repeat by adjusting the line’s angle (rotating) and calculating new residuals and their squared sums.

        • Plot the sum of squared residuals against the line's rotations to find the position that minimizes this sum.

    • Result: The line with the minimum sum of squared residuals is chosen as the best fit line.

  2. R-squared (R²)

    • Function: A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable (or variables) in a regression model.

    • Calculation:

      • Total Sum of Squares Formula:
        SS_{mean} = ext{Sum of Squares around the Mean} = ext{Data} - ext{Mean} ^2

      • Variation around the Mean: ext{Variance} = rac{SS_{mean}}{n}

      • Sum of Squares for the Fit:
        SS_{fit} = ext{Sum of Squares around the Fit} = ext{Data} - ext{Fitted Line} ^2

      • Variation around the Fit: ext{Variance around Fit} = rac{SS_{fit}}{n}

      • R² Formula:
        R^2 = rac{SS{mean} - SS{fit}}{SS_{mean}}

    • Interpretation:

      • High R² indicates that a significant amount of variance in the dependent variable is accounted for by the independent variable(s).

      • Example Results:

      • If R² = 0.6, it means 60% of the variance in mouse size can be explained by mouse weight.

      • If R² = 1, mouse weight explains 100% of the variance in mouse size.

      • If R² = 0, mouse weight does not explain any variance.

  3. Calculating R-squared Examples

    • Example Calculations:

      • Given:

      • Variation around the Mean = 11.1

      • Variation around the Fit = 4.4

      • Calculation: R^2 = rac{11.1 - 4.4}{11.1} = 0.6

      • Interpretation: 60% reduction in variance upon accounting for weight.

Great question — and NO, low variance & low bias does NOT mean you want a small R^2.

This is a super common confusion, so let me clear it up cleanly.

First: Bias–Variance vs. R² measure 

different things

Bias–variance

  • A property of the model class and the training process

  • About how well your method generalizes to unseen data

  • High variance = overfits

  • High bias = underfits

R² (coefficient of determination)

  • A measure of how well your model fits the existing training data

  • Does NOT measure generalization

  • Can be high even when the model has terrible variance

  • Can be low even when the model generalizes perfectly (e.g., very noisy data)

So the two concepts are not aligned.

“Since we want low variance, maybe a small R² is better.”

No — and here’s why.

If R² is “small”

It usually means:

Your model is underfitting

→ High bias

→ Model is too simple

→ You are not capturing real patterns

R² close to 0 usually means the model learned almost nothing.

It can also mean the data is inherently noisy

→ R² cannot be improved no matter what

→ Bias–variance has nothing to do with it

If R² is “big”

It usually means:

Your model fits the data well

and that the independent variables explain a significant proportion of the variance in the dependent variable.

But beware:

  • R² can be high because of overfitting

  • A neural network can achieve R² = 0.999 on training data but generalize terribly

This is why ML uses test sets, cross-validation, and regularization.

🔥 So what SHOULD you optimize?

Not R².

You optimize test error, like:

  • MAE

  • RMSE

  • MSE

  • Cross-entropy

  • Accuracy

Your validation/test performance tells you whether bias and variance are balanced.

R² is just descriptive — it doesn’t guide ML model selection.

Clean 15-second explanation (interview-ready)

R² only measures how well the model fits the training data.

A small R² usually means high bias (underfitting).

A high R² can still mean high variance (overfitting).

Bias–variance tradeoff is evaluated on validation/test error, not R².

Additional Concepts and Scenarios in Linear Regression

  • When knowing mouse weight allows perfect predictions, R² would be 1 (100% explained variance).

  • If knowing mouse weight does not provide predictive power, R² would be 0.

  • Even complex equations can use R-squared, which depends solely on comparing the sum of squares around the mean and fit.

Multi-Variable Linear Regression

  1. Modeling with Multiple Predictors

    • Scenario: Predicting body length using both mouse weight and tail length.

    • Visualization: Use a 3D graph to plot weight, tail length, and body length.

    • Fitting Process: Performs similar least squares adjustments but for a plane instead of a line based on two variables.

    • Result: More predictors can only maintain or improve the R² value due to the nature of least squares fitting.

  2. Parameters and Adjusted R-squared

    • Definition: Adjusted R² adjusts the R² value based on the number of parameters in the model to prevent overfitting by adding unnecessary predictors.

Evaluating R-squared Significance

  1. Role of P-value

    • Importance: Indicates whether the R² value is statistically significant or due to random chance.

    • Calculation:

      • Derived from the F-statistic f , which is the ratio of the variance explained by the model to the variance not explained, indicating the quality of fit.

      • f = rac{ ext{Variation explained}}{ ext{Variation not explained}}

      • Degrees of freedom are used to adjust F-values into a standardized format for significance testing.

  2. P-value Computation Steps

    • Generate random data and calculate mean and sums of squares around it.

    • Utilize results of F-statistics, plotting results into a histogram for interpretation.

    • Establish significance by assessing how many generated values exceed the original data F-value.

  3. Final Takeaways

    • Necessity of R² to quantify explained variance in regression analysis.

    • Importance of p-value for establishing the reliability of the R² value.

    • Ideal outcome in regression analysis is both a high R² (large) and a low p-value (small).

Conclusion

  • Linear regression is a fundamental tool in statistics for quantifying relationships between variables, and it requires careful consideration of R² and p-values for valid interpretations.

  • Importance of understanding these concepts for effective data analysis and drawing accurate conclusions in research and applied statistics.