unit 3
Linear Regression Overview
Regression: Finding relationships between variables used for predicting outcomes.
Linear Regression: A supervised machine learning algorithm that predicts future values by fitting a linear equation to observed data.
Simple Linear Regression: One independent variable.
Multiple Linear Regression: More than one independent variable.
Univariate vs Multivariate: Number of dependent variables involved.
Importance of Linear Regression
Interpretability: Provides clear coefficients for understanding the impact of independent variables.
Simplicity: Easy to implement, foundational for more complex algorithms.
Types of Linear Regression
Simple Linear Regression
Only one independent (X) and dependent (Y) variable.
Equation: Y = B0 + B1X.
Multiple Linear Regression
More than one independent variable.
Equation: Y = B0 + B1X1 + B2X2 + ... + BnXn.
Best Fit Line
Objective: Minimize the error between predicted and actual values.
Best Fit Line Equation: Represents relationship between dependent and independent variables.
Hypothesis Function in Linear Regression
Predicting salary Y based on experience X using linear relationship.
Y^ = B0 + B1X.
B0: intercept; B1: slope; update values to minimize error between predicted and actual.
Cost Function for Linear Regression
Mean Squared Error (MSE): The average of the squared differences between predicted and actual values.
Python Implementation Example
Use libraries (like Matplotlib and Scipy) to plot data, compute slope, intercept, and draw regression lines.
Relationship Measurement - Coefficient of Correlation
R value: Ranges from -1 to 1, indicating relationship strength between X and Y.
Predicting Future Values
Using the linear function to make predictions based on existing data.
Polynomial Regression
Used when data doesn't fit a straight line well.
Evaluation Metrics for Linear Regression
Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.
Mean Squared Error (MSE): Average of squared prediction errors.
R-squared (R2) Score: % of variance explained by model.
Root Mean Squared Error (RMSE): Measures accuracy of predictions.
Regularization Techniques
Lasso Regression (L1): Adds penalty for absolute value of coefficients.
Ridge Regression (L2): Adds penalty for squared value of coefficients.
Elastic Net: Combines L1 and L2 regularization.
Bias-Variance Tradeoff
The balance between underfitting (high bias) and overfitting (high variance) is crucial for model performance.
Low bias and low variance model performs best on unseen data.