Note

0.0(0)

Take a practice test

Chat with Kai

View the linked PDF

Explore Top Notes

ISLAMIAT LECTURE 5 MAJOR THEMES OF THE HADITH (PART 4)

Studied by 24 people

Chapter 3 - Supply & demand

Studied by 27 people

Studied by 37 people

La Familia Level 1

Studied by 229 people

Human Genome Nucleotide Sequence Shows Gene Order

Studied by 1 person

Chapter 22- Evolution by Natural Selection

Studied by 33 people

8-Linear Regression

Regression Overview

Definition: Regression models in supervised learning predict a continuous numerical label based on a set of features.
Examples of Regression Applications:
- Predicting customer credit card activities from demographics and historical data.
- Estimating driving/pickup time from point A to point B.
Classification vs Regression Models:
- Classification models predict probabilities of categories (e.g., class A, B).
- Regression models predict numerical values.
Note: Decision trees can be used for regression tasks with specific methods for determining root and leaf nodes based on variance. If the data you used to train your model is categorial (non-numerical), then it is a classification model. If it is numerical, then it is a regression model.

Linear Regression

Definition: A supervised learning algorithm used to predict continuous values (label) based on input features.
Functionality: Models the relationship between input features (X) and output values (Y) as a linear function.
Importance: Considered fundamental in machine learning; serves as a basis for more complex models (e.g., deep learning).

Supervised Learning Components

Core Elements:
- Parameters: Coefficients that define the mapping function.
- Features (F): Input variables used for prediction.
- Outputs: The predicted continuous values.
- Labels: Actual outcomes associated with the training data.
- Cost and Loss: Metrics used for evaluating model performance.

Notation for Linear Regression

Variables:
- n: Number of training examples
- m: Number of features
- xi: Feature vector of the ith training example
- yi: Label of the ith training example
- w: Set of parameters (coefficients) of the regression model ([w0, w1, …, wm]).
- hw(x): Predicted value based on the mapping function.

Learning Process in Linear Regression

Choose a Mapping Function:
- With unknown parameters initialized with random values.
Define Loss Function:
- Assess the difference between predicted and actual values.
Optimize the Loss Function:
- Minimize the loss to obtain the best parameter values.
- Example Equation: Price = w0 + w1 * sqft + w2 * year + w3 * location.

Step 1: Mapping Function

Step 2: Loss Function in Linear Regression

Functionality: Measures the error between predicted values and actual values.
Objective: Minimize the sum of squared errors to find the best-fitting hyperplane (optimal w).
Loss Function Formula: Loss = sum of squared differences between predicted (Y) and true values (Y).

Step 3: Optimization with Gradient Descent

Gradient descent: Fundamental optimization algorithm to minimize the loss function in machine learning and find the optimal parameters for a model.
Widely used in training neural networks, linear regression, logistic regression
Steps Involved:
- Compute the gradient of the loss function with respect to parameters.
- Update parameters: adjust parameters in the direction that reduces the loss.
- Repeat until convergence. Evaluate the model performance on a validation dataset to ensure that it generalizes well to unseen data.

Gradient Descent Steps

Pick an initial value for w.
Calculate gradient concerning parameters.
Update parameters using:
- w = w - learning_rate * gradient
Repeat until a stopping criterion is met.

Gradient Descent Elasticity Example

Startup Point: Learning rate (lr) is set to 0.1, initial w = -10.
Calculation Example: Gradient calculation simplifies loss = w^2 + 2.

Implications of Learning Rate

Effects:
- Too small: Slow convergence.
- Too high: May overshoot the minimum, preventing convergence.

Linear Regression Interpretability

Coefficient Interpretation:
- Coefficients (w1, w2, ..., wm) indicate the effect of features on the predicted label (y).
- Example Interpretation: A coefficient (e.g., w1 = 2) suggests a one-unit increase in x1 leads to a two-unit increase in y.
- The sign (+ or -) of the coefficients indicates the direction of the relationship between the features and the label.
Normalization: In normalized features, coefficient magnitude indicates the importance of the feature on the label. This means that a feature with a larger coefficient has a greater impact on the prediction of y compared to features with smaller coefficients.

Model Complexity in Machine Learning

Machine learning model is fitted to a training dataset to learn the model parameters.
Aim: Avoid underfitting (too simplistic) and overfitting (too complex).

Overfitting: When a model learns training data too well instead of generalizing from it. Causes: Complexity, excessive features, insufficient training data.
Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor performance on both training and test datasets. Causes: Insufficient model complexity, lack of relevant features, or overly restrictive assumptions.
To strike a balance, it's essential to select the right model complexity and perform techniques such as cross-validation to ensure the model generalizes well.

Strategies to Avoid Overfitting

Increase Data: More rows of data rather than columns. Obtaining more training data may not always be feasible.
Feature Selection: Choose the most relevant features, potentially using domain knowledge or filter method.
Regularization: Add a penalty term to the loss function (e.g., Lasso regression).

Lasso Regression

Concept: Lasso regression (L1 regularization) encourages a sparse (thinly dispersed) model with a few non-zero coefficients. This means that features that are not important/relevant will have a coefficient of 0. This helps in feature selection, as it effectively reduces the number of variables in the model, leading to improved interpretability and reduced overfitting.
Rule of Thumb: Ensure n (number of examples) is greater than 10 times m (number of features).
Points:
- IIrrelevant/uninformative features will have zero coefficient.
- The sparse regression model is also good for explanation. It helps us to pick up the informative features.
- Lasso regression is a simple techniques to reduce model
  complexity and prevent over-fitting which may result from
  simple linear regression

Linear Regression Summary

Applications: House price prediction, sales forecasting, stock price prediction.
Pros: Simple, fast, interpretable.
Cons: Assumes linear relationships, sensitive to outliers, not suitable for complex/non-linear patterns.

Note

0.0(0)

Take a practice test

Chat with Kai

View the linked PDF

Explore Top Notes

ISLAMIAT LECTURE 5 MAJOR THEMES OF THE HADITH (PART 4)

Studied by 24 people

Chapter 3 - Supply & demand

Studied by 27 people

Studied by 37 people

La Familia Level 1

Studied by 229 people

Human Genome Nucleotide Sequence Shows Gene Order

Studied by 1 person

Chapter 22- Evolution by Natural Selection

Studied by 33 people