unit 3

Linear Regression Overview

  • Regression: Finding relationships between variables used for predicting outcomes.

  • Linear Regression: A supervised machine learning algorithm that predicts future values by fitting a linear equation to observed data.

    • Simple Linear Regression: One independent variable.

    • Multiple Linear Regression: More than one independent variable.

    • Univariate vs Multivariate: Number of dependent variables involved.

Importance of Linear Regression

  • Interpretability: Provides clear coefficients for understanding the impact of independent variables.

  • Simplicity: Easy to implement, foundational for more complex algorithms.

Types of Linear Regression

Simple Linear Regression

  • Only one independent (X) and dependent (Y) variable.

  • Equation: Y = B0 + B1X.

Multiple Linear Regression

  • More than one independent variable.

  • Equation: Y = B0 + B1X1 + B2X2 + ... + BnXn.

Best Fit Line

  • Objective: Minimize the error between predicted and actual values.

  • Best Fit Line Equation: Represents relationship between dependent and independent variables.

Hypothesis Function in Linear Regression

  • Predicting salary Y based on experience X using linear relationship.

  • Y^ = B0 + B1X.

  • B0: intercept; B1: slope; update values to minimize error between predicted and actual.

Cost Function for Linear Regression

  • Mean Squared Error (MSE): The average of the squared differences between predicted and actual values.

Python Implementation Example

  • Use libraries (like Matplotlib and Scipy) to plot data, compute slope, intercept, and draw regression lines.

Relationship Measurement - Coefficient of Correlation

  • R value: Ranges from -1 to 1, indicating relationship strength between X and Y.

Predicting Future Values

  • Using the linear function to make predictions based on existing data.

Polynomial Regression

  • Used when data doesn't fit a straight line well.

Evaluation Metrics for Linear Regression

  • Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.

  • Mean Squared Error (MSE): Average of squared prediction errors.

  • R-squared (R2) Score: % of variance explained by model.

  • Root Mean Squared Error (RMSE): Measures accuracy of predictions.

Regularization Techniques

  • Lasso Regression (L1): Adds penalty for absolute value of coefficients.

  • Ridge Regression (L2): Adds penalty for squared value of coefficients.

  • Elastic Net: Combines L1 and L2 regularization.

Bias-Variance Tradeoff

  • The balance between underfitting (high bias) and overfitting (high variance) is crucial for model performance.

  • Low bias and low variance model performs best on unseen data.