Module 4
Machine Learning Module 4: Supervised Learning - Regression
Manar Mohaisen
Department of Computer Science
Table of Contents
Supervised Learning
Regression
Linear Regression
Overfitting and Regularization
Ridge, Lasso, Elastic Net Regularizations
Polynomial Regression
Batch Gradient Descent, Minibatch Gradient Descent, Stochastic Gradient Descent
Questions and Feedback
Supervised Learning
The dataset is represented as pairs
Notation: (X, y)
Where:
N = dataset size (number of rows)
M = number of features (number of columns)
Each feature vector corresponds to a label.
Regression
Definition: Regression is the method that aims to establish an approximate relationship between a dependent continuous variable and one or more independent variables.
Linear Regression
Simple Linear Regression:
Involves a single dependent variable and a single independent variable.
Often referred to as ordinary least squares (OLS).
The objective is to find the best-fitting line through the data points.
Finding Weight and Bias in OLS
To determine the weight and bias of an OLS, minimize the total error between the actual output and the model’s output:
Mean Squared Error (MSE) is used for minimization.
Partial derivatives with respect to the coefficients are calculated to find optimal values.
Mathematical Formulation
The regression model can be expressed as:
Minimize MSE = rac{1}{N} extstyleig( extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle (yi - (w0 + w1 xi))^2ig)
Overfitting and Regularization
Overfitting:
Occurs when a model fits the training data too closely and fails to generalize to unseen data, producing high variance.
Solutions:
Regularization
Reducing model complexity
Regularization Techniques
Ridge Regression (L2 Regularization):
Adds a penalty proportional to the square of the coefficients.
Formulated as:
Lasso Regression (L1 Regularization):
Adds a penalty proportional to the absolute value of the coefficients.
Formulated as:
Elastic Net Regularization:
Combines Lasso and Ridge penalties.
Polynomial Regression
Definition: A special case of linear regression where the model includes polynomial terms up to a specified order.
Examples:
Order 2 with one feature:
Order 2 with two features:
Regularization in Polynomial Regression
Polynomial regression can benefit from regularization to improve model performance, especially in cases with noisy data.
Gradient Descent Algorithm
Definition: A common optimization algorithm used to train machine learning models by minimizing the cost function.
Cost Function: Represents the difference between actual and predicted outputs; also referred to as the loss function and optimization criterion.
Gradient Descent Process
Initialization: Start with weights at time t = 0.
For each iteration, compute the gradient with respect to each parameter.
Update parameters based on the computed gradients:
Variants of Gradient Descent
Batch Gradient Descent: Utilizes the entire dataset for each iteration, leading to higher computational costs.
Stochastic Gradient Descent (SGD): Updates weights using a single sample at random, speeding up learning.
Minibatch Gradient Descent: A subset of the training data is used for each update, balancing efficiency and accuracy.
Tuning the Learning Rate
Affects convergence speed.
Large learning rate: Risk of overshooting the global minimum.
Small learning rate: Slower convergence.
Solutions: tuning the learning rate, using variable learning rates, or implementing momentum-based methods.
Conclusion
Regularization techniques and optimization algorithms like gradient descent are vital for improving the performance of machine learning models.
Understanding and applying these concepts can lead to better model generalization and prediction accuracy. By effectively managing these hyperparameters, practitioners can enhance their models' ability to learn from data and avoid overfitting, thus ensuring robustness in various applications.