Module 4

Manar Mohaisen
Department of Computer Science

Definition: Regression is the method that aims to establish an approximate relationship between a dependent continuous variable and one or more independent variables.

Simple Linear Regression:
- Involves a single dependent variable and a single independent variable.
- Often referred to as ordinary least squares (OLS).
- The objective is to find the best-fitting line through the data points.

To determine the weight and bias of an OLS, minimize the total error between the actual output and the model’s output:
- Mean Squared Error (MSE) is used for minimization.
- Partial derivatives with respect to the coefficients are calculated to find optimal values.

The regression model can be expressed as: $yi = w0 + w1 xi$
- Minimize $MSE = rac{1}{N} extstyle\big( extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle (yi - (w0 + w1 xi))^2\big)$

Overfitting:
- Occurs when a model fits the training data too closely and fails to generalize to unseen data, producing high variance.
- Solutions:
  - Regularization
  - Reducing model complexity

Ridge Regression (L2 Regularization):
- Adds a penalty proportional to the square of the coefficients.
- Formulated as:
 $W = ext{arg min}W ||XW - y||^2 + extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle {L extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle L Iwi|^c|}$
Lasso Regression (L1 Regularization):
- Adds a penalty proportional to the absolute value of the coefficients.
- Formulated as:
 $w^* = ext{arg min}w ||XW - y||^2 extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle L Σ |wi| ≤ c$
Elastic Net Regularization:
- Combines Lasso and Ridge penalties.

Definition: A special case of linear regression where the model includes polynomial terms up to a specified order.
Examples:
- Order 2 with one feature: $y = w0 + w1 x + w_2 x^2$
- Order 2 with two features: $y = w0 + w1 a + w2 b + w3 a^2 + w4 b^2 + w5 ab$

Polynomial regression can benefit from regularization to improve model performance, especially in cases with noisy data.

Definition: A common optimization algorithm used to train machine learning models by minimizing the cost function.
Cost Function: Represents the difference between actual and predicted outputs; also referred to as the loss function and optimization criterion.

Initialization: Start with weights at time t = 0.
For each iteration, compute the gradient with respect to each parameter.
Update parameters based on the computed gradients:
- $w(t+1) = w(t) - abla E(w)$

Batch Gradient Descent: Utilizes the entire dataset for each iteration, leading to higher computational costs.
Stochastic Gradient Descent (SGD): Updates weights using a single sample at random, speeding up learning.
Minibatch Gradient Descent: A subset of the training data is used for each update, balancing efficiency and accuracy.

Affects convergence speed.
- Large learning rate: Risk of overshooting the global minimum.
- Small learning rate: Slower convergence.
- Solutions: tuning the learning rate, using variable learning rates, or implementing momentum-based methods.

Regularization techniques and optimization algorithms like gradient descent are vital for improving the performance of machine learning models.
Understanding and applying these concepts can lead to better model generalization and prediction accuracy. By effectively managing these hyperparameters, practitioners can enhance their models' ability to learn from data and avoid overfitting, thus ensuring robustness in various applications.