Module 3: Machine Learning Algorithms

CLASSIFICATION

Classification teaches a machine to sort items into categories.
It learns from labeled examples (e.g., emails marked as "spam" or "not spam").
After learning, it can categorize new items (e.g., identifying if a new email is spam).
Example: A model trained on images of dogs and cats can predict the class of new images based on features like color, texture, and shape.
The horizontal axis represents the combined values of color and texture features.
The vertical axis represents the combined values of shape and size features.
Each colored dot represents an individual image, with the color indicating the model's prediction (dog or cat).
Shaded areas show the decision boundary, which the model uses to decide which category an image belongs to.

Types of Classification

Classification sorts data into categories based on features.

1. Binary Classification

Sorts data into two distinct categories.
Like making a choice between two options.
Example: A system that sorts emails into spam or not spam.
It examines email features (keywords, sender details) and decides if it's spam.

2. Multiclass Classification

Sorts data into more than two categories.
The model picks the category that best matches the input.
Example: An image recognition system that sorts pictures of animals into categories like cat, dog, and bird.
The system looks at features (shape, color, texture) and chooses the most likely animal.

3. Multi-Label Classification

A single piece of data can belong to multiple categories at once.
Differs from multiclass classification, where each data point belongs to only one class.
Example: A movie recommendation system tags a movie as both action and comedy.
The system checks features (plot, actors, genre tags) and assigns multiple labels to a single piece of data.

How Classification in Machine Learning Works

Classification involves training a model using a labeled dataset.
Each input is paired with its correct output label.
The model learns patterns and relationships in the data.
It can then predict labels for new, unseen inputs.

Steps:

Data Collection: Start with a dataset where each item is labeled with the correct class (e.g., "cat" or "dog").
Feature Extraction: Identify features (color, shape, texture) that distinguish one class from another.
Model Training: The classification algorithm uses labeled data to learn how to map the features to the correct class, looking for patterns and relationships.
Model Evaluation: Test the trained model on unseen data to check its classification accuracy.
Prediction: The model predicts the class of new data based on learned features.
Model Evaluation: Check how well the model performs on new data using different metrics.

If the quality metric is not satisfactory, adjust the ML algorithm or hyperparameters and retrain the model until satisfactory performance is achieved.
Classification in machine learning involves using labeled data to teach the model how to predict the class of new, unlabeled data based on learned patterns.

Examples of Machine Learning Classification in Real Life

Email spam filtering
Credit risk assessment: Predicts loan default likelihood based on credit score, income, and loan history.
Medical diagnosis: Classifies whether a patient has a condition (e.g., cancer, diabetes) based on medical data.
Image classification: Used in facial recognition, autonomous driving, and medical imaging.
Sentiment analysis: Determines if the sentiment of a text is positive, negative, or neutral.
Fraud detection: Detects fraudulent activities by analyzing transaction patterns.
Recommendation systems: Recommends products or content based on past user behavior.

Classification Modeling in Machine Learning

Classification modeling uses machine learning algorithms to categorize data into predefined classes or labels.

Key characteristics:

Class Separation: Distinguishes between distinct classes.
Decision Boundaries: Draws decision boundaries in the feature space.
Sensitivity to Data Quality: Requires well-labeled, representative data.
Handling Imbalanced Data: Uses techniques like resampling or weighting to handle class imbalances.
Interpretability: Some algorithms like Decision Trees offer higher interpretability.

Classification Algorithms

Linear Classifiers:

Create a linear decision boundary between classes.
Simple and computationally efficient.
Examples: Logistic Regression, Support Vector Machines (with linear kernel), Single-layer Perceptron, Stochastic Gradient Descent (SGD) Classifier

Non-linear Classifiers:

Create a non-linear decision boundary between classes.
Capture more complex relationships between input features and the target variable.
Examples: K-Nearest Neighbors, Kernel SVM, Naive Bayes, Decision Tree Classification

Ensemble learning classifiers:

Random Forests, AdaBoost, Bagging Classifier, Voting Classifier, Extra Trees Classifier, Multi-layer Artificial Neural Networks

Decision Tree in Machine Learning

A supervised learning algorithm used for classification and regression tasks.
Models decisions as a tree-like structure.
- Internal nodes represent attribute tests.
- Branches represent attribute values.
- Leaf nodes represent final decisions or predictions.
Versatile, interpretable, and widely used for predictive modeling.

Intuition behind the Decision Tree

Imagine you’re deciding whether to buy an umbrella:
1. Step 1 – Ask a Question (Root Node): Is it raining? If yes, you might decide to buy an umbrella. If no, you move to the next question.
2. Step 2 – More Questions (Internal Nodes):
  - Is it likely to rain later? If yes, you buy an umbrella; if no, you don’t.
3. Step 3 – Decision (Leaf Node): Based on your answers, you either buy or skip the umbrella

Example: Predicting Whether a Person Likes Computer Games

Start with the Root Question (Age):
- The first question is: Is the person's age less than 15?
  - If Yes, move to the left.
  - If No, move to the right.
Branch Based on Age:
- If the person is younger than 15, they are likely to enjoy computer games (+2 prediction score).
- If the person is 15 or older, ask the next question: Is the person male?
Branch Based on Gender (For Age 15+):
- If the person is male, they are somewhat likely to enjoy computer games (+0.1 prediction score).
- If the person is not male, they are less likely to enjoy computer games (-1 prediction score)

Example: Predicting Whether a Person Likes Computer Games Using Two Decision Trees

Tree 1: Age and Gender
- The first tree asks two questions:
  - Is the person’s age less than 15?
    - If Yes, they get a score of +2. If No, proceed to the next question.
  - Is the person male?
    - If Yes, they get a score of +0.1. If No, they get a score of -1.
Tree 2: Computer Usage
- The second tree focuses on daily computer usage:
  - Does the person use a computer daily?
    - If Yes, they get a score of +0.9. If No, they get a score of -0.9.
Combining Trees: Final Prediction
- The final prediction score is the sum of scores from both trees

Attribute Selection Measures

Information Gain
Gini Index

Information Gain

Measures the usefulness of a question (or feature) for splitting data into groups.
Tells us how much the uncertainty decreases after the split.
A good question creates clearer groups.
The feature with the highest Information Gain is chosen to make the decision.
Example: Splitting people into "Young" and "Old" based on age, where all young people bought a product while all old people did not, would result in high Information Gain because the split perfectly separates the groups with no uncertainty.
$Gain(S, A) = Entropy(S)-\Sigma Sv.Entropy (Sv)$
- Where:
 - S is a set of instances
 - A is an attribute
 - $S_v$ is the subset of S
 - v represents an individual value that the attribute A can take
 - Values(A) is the set of all possible values of A

Entropy

Measures the uncertainty of a random variable.
Characterizes the impurity of an arbitrary collection of examples.
Higher entropy means more information content.
Example: If a dataset has an equal number of "Yes" and "No" outcomes, the entropy is high because it's uncertain which outcome to predict. If all outcomes are the same, the entropy is 0.
$Gain (S, A) = Entropy(S)-\Sigma{v \epsilon Values(A)} \frac{Sv}{S} Entropy (S_v)$
- Where:
 - S is a set of instances, A is an attribute,
 - $S_v$ is the subset of S with A = v, and
 - Values (A) is the set of all possible values of A.

ARTIFICIAL NEURAL NETWORKS

Artificial Neural Networks contain artificial neurons which are called units.
These units are arranged in series of layers that together constitute the whole Artificial Neural Network in a system.
A layer can have only a dozen units or millions of units as this depends on how the complex neural networks will be required to learn the hidden patterns in the dataset.
Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden layers.
The input layer receives data from the outside world which the neural network needs to analyze or learn about.
Then this data passes through one or multiple hidden layers that transform the input into data that is valuable for the output layer.
Finally, the output layer provides an output in the form of a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another.
Each of these connections has weights that determine the influence of one unit on another unit.
As the data transfers from one unit to another, the neural network learns more and more about the data which eventually results in an output from the output layer.

Neural Networks Architecture

Input Layer
Hidden Layers
Output Layer
The structures and operations of human neurons serve as the basis for artificial neural networks.
It is also known as neural networks or neural nets.
The input layer of an artificial neural network is the first layer, and it receives input from external sources and releases it to the hidden layer, which is the second layer.
In the hidden layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends it to the neurons in the next layer.
These connections are weighted means effects of the inputs from the previous layer are optimized more or less by assigning different-different weights to each input and it is adjusted during the training process by optimizing these weights for improved model performance.

Artificial neurons vs Biological neurons

The concept of artificial neural networks comes from biological neurons found in animal brains
So they share a lot of similarities in structure and function wise.

Structure:

The structure of artificial neural networks is inspired by biological neurons.
A biological neuron has a cell body or soma to process the impulses, dendrites to receive them, and an axon that transfers them to other neurons.
- The input nodes of artificial neural networks receive input signals, the hidden layer nodes compute these input signals, and the output layer nodes compute the final output by processing the hidden layer's results using activation functions.

Synapses:

Synapses are the links between biological neurons that enable the transmission of impulses from dendrites to the cell body.
Synapses are the weights that join the one-layer nodes to the next-layer nodes in artificial neurons.
The strength of the links is determined by the weight value.

Learning:

In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus that helps to process the impulses.
An action potential is produced and travels through the axons if the impulses are powerful enough to reach the threshold.
This becomes possible by synaptic plasticity, which represents the ability of synapses to become stronger or weaker over time in reaction to changes in their activity.
In artificial neural networks, backpropagation is a technique used for learning, which adjusts the weights between nodes according to the error or differences between predicted and actual outcomes.

Activation:

In biological neurons, activation is the firing rate of the neuron which happens when the impulses are strong enough to reach the threshold.
In artificial neural networks, A mathematical function known as an activation function maps the input to the output, and executes activations.

How do Artificial Neural Networks learn?

Artificial neural networks are trained using a training set.
For example, suppose you want to teach an ANN to recognize a cat.
Then it is shown thousands of different images of cats so that the network can learn to identify a cat.
Once the neural network has been trained enough using images of cats, then you need to check if it can identify cat images correctly.
This is done by making the ANN classify the images it is provided by deciding whether they are cat images or not.
The output obtained by the ANN is corroborated by a human-provided description of whether the image is a cat image or not.
If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned during training.
Backpropagation is done by fine-tuning the weights of the connections in ANN units based on the error rate obtained.
This process continues until the artificial neural network can correctly recognize a cat in an image with minimal possible error rates.

What are the types of Artificial Neural Networks?

Feedforward Neural Network
Convolutional Neural Network
Modular Neural Network
Radial basis function Neural Network
Recurrent Neural Network

Feedforward Neural Network:

The feedforward neural network is one of the most basic artificial neural networks.
In this ANN, the data or the input provided travels in a single direction.
It enters into the ANN through the input layer and exits through the output layer while hidden layers may or may not exist.
So the feedforward neural network has a front-propagated wave only and usually does not have backpropagation.

Convolutional Neural Network:

A Convolutional neural network has some similarities to the feed- forward neural network, where the connections between units have weights that determine the influence of one unit on another unit.
But a CNN has one or more than one convolutional layer that uses a convolution operation on the input and then passes the result obtained in the form of output to the next layer.
CNN has applications in speech and image processing which is particularly useful in computer vision.

Modular Neural Network:

A Modular Neural Network contains a collection of different neural networks that work independently towards obtaining the output with no interaction between them.
Each of the different neural networks performs a different sub-task by obtaining unique inputs compared to other networks.
The advantage of this modular neural network is that it breaks down a large and complex computational process into smaller components, thus decreasing its complexity while still obtaining the required output.

Radial basis function Neural Network:

Radial basis functions are those functions that consider the distance of a point concerning the center.
RBF functions have two layers.
In the first layer, the input is mapped into all the Radial basis functions in the hidden layer and then the output layer computes the output in the next step.
Radial basis function nets are normally used to model the data that represents any underlying trend or function.

Recurrent Neural Network:

The Recurrent Neural Network saves the output of a layer and feeds this output back to the input to better predict the outcome of the layer.
The first layer in the RNN is quite similar to the feed-forward neural network and the recurrent neural network starts once the output of the first layer is computed.
After this layer, each unit will remember some information from the previous step so that it can act as a memory cell in performing computations.

APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS:

Social Media
Marketing and Sales
Healthcare
Personal Assistants

REGRESSION

Regression in machine learning refers to a supervised learning technique where the goal is to predict a continuous numerical value based on one or more independent features.
It finds relationships between variables so that predictions can be made.
we have two types of variables present in regression:
- Dependent Variable (Target): The variable we are trying to predict e.g house price.
- Independent Variables (Features): The input variables that influence the prediction e.g locality, number of rooms.
Regression analysis problem works with if output variable is a real or continuous value such as “salary” or “weight”.
Many different regression models can be used but the simplest model in them is linear regression.

LINEAR REGRESSION

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
It provides valuable insights for prediction and data analysis.
Linear regression is also a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets.
It computes the linear relationship between the dependent variable and one or more independent features by fitting a linear equation with observed data.
It predicts the continuous output variables based on the independent input variable.
For example if we want to predict house price we consider various factor such as house age, distance from the main road, location, area and number of room, linear regression uses all these parameter to predict house price as it consider a linear relation between all these features and price of house.9

Why Linear Regression is Important?

The interpretability of linear regression is one of its greatest strengths.
The model’s equation offers clear coefficients that illustrate the influence of each independent variable on the dependent variable, enhancing our understanding of the underlying relationships.
Its simplicity is a significant advantage; linear regression is transparent, easy to implement, and serves as a foundational concept for more advanced algorithms.

What is the best Fit Line?

Our primary objective while using linear regression is to locate the best-fit line, which implies that the error between the predicted and actual values should be kept to a minimum.
There will be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship between the dependent and independent variables.
The slope of the line indicates how much the dependent variable changes for a unit change in the independent variable(s).
Here Y is called a dependent or target variable and X is called an independent variable also known as the predictor of Y.
There are many types of functions or modules that can be used for regression.
A linear function is the simplest type of function.
Here, X may be a single feature or multiple features representing the problem.
Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x)).
Hence, the name is Linear Regression.
In linear regression some hypothesis are made to ensure reliability of the model’s results.

Hypothesis function in Linear Regression

Assumptions are:
- Linearity: It assumes that there is a linear relationship between the independent and dependent variables. This means that changes in the independent variable lead to proportional changes in the dependent variable.
- Independence: The observations should be independent from each other that is the errors from one observation should not influence other.
As we have discussed that our independent feature is the experience i.e X and the respective salary Y is the dependent variable.
Let’s assume there is a linear relationship between X and Y then the salary can be predicted using:
$\hat{Y} = 01+02X$
OR
$\hat{yi} = 01 + 02xi$
Here,
- $y_i \epsilon Y (i = 1,2,…,n)$ are labels to data (Supervised learning)
- $x_i \epsilon X (i = 1, 2,…,n)$ are the input independent training data (univariate - one input variable(parameter))
- $\hat{y_i} \epsilon \hat{Y} (i = 1,2,…,n)$ are the predicted values.
The model gets the best regression fit line by finding the best $01$ and $02$ values.
- $0_1$ : intercept
- $0_2$ : coefficient of x
Once we find the best $01$ and $02$ values, we get the best-fit line. So when we are finally using our model for prediction, it will predict the value of y for the input value of x.

How to update $01$ and $02$ values to get the best-fit line?

To achieve the best-fit regression line, the model aims to predict the target value Ŷ such that the error difference between the predicted value Ŷ and the true value Y is minimum.
So, it is very important to update the $01$ and $02$ values, to reach the best value that minimizes the error between the predicted y value (pred) and the true y value (y).
$minimize (\hat{yi} - Yi)^2$

Types of Linear Regression

When there is only one independent feature it is known as Simple Linear Regression or Univariate Linear Regression
When there are more than one feature it is known as Multiple Linear Regression or Multivariate Regression.

Assumptions of Simple Linear Regression

Linear regression is a powerful tool for understanding and predicting the behavior of a variable, however, it needs to meet a few conditions in order to be accurate and dependable solutions.

Linearity: The independent and dependent variables have a linear relationship with one another. This implies that changes in the dependent variable follow those in the independent variable(s) in a linear fashion. This means that there should be a straight line that can be drawn through the data points. If the relationship is not linear, then linear regression will not be an accurate model.
Independence: The observations in the dataset are independent of each other. This means that the value of the dependent variable for one observation does not depend on the value of the dependent variable for another observation. If the observations are not independent, then linear regression will not be an accurate model.
Homoscedasticity: Across all levels of the independent variable(s), the variance of the errors is constant. This indicates that the amount of the independent variable(s) has no impact on the variance of the errors. If the variance of the residuals is not constant, then linear regression will not be an accurate model.
Normality: The residuals should be normally distributed. This means that the residuals should follow a bell-shaped curve. If the residuals are not normally distributed, then linear regression will not be an accurate model.

Use Case of Simple Linear Regression

In a case study evaluating student performance analysts use simple linear regression to examine the relationship between study hours and exam scores.
By collecting data on the number of hours students studied and their corresponding exam results the analysts developed a model that reveal correlation, for each additional hour spent studying, students exam scores increased by an average of 5 points.
This case highlights the utility of simple linear regression in understanding and improving academic performance.
Another case study focus on marketing and sales where businesses uses simple linear regression to forecast sales based on historical data particularly examining how factors like advertising expenditure influence revenue.
By collecting data on past advertising spending and corresponding sales figures analysts developed a regression model that tells the relationship between these variables.
For instance if the analysis reveals that for every additional dollar spent on advertising sales increase by $10. This predictive capability enables companies to optimize their advertising strategies and allocate resources effectively.

Multiple Linear Regression

Multiple linear regression involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
$y= B0+B1X1+ B2X2+……… BnX_n$
- where:
 - Y is the dependent variable
 - $X1, X2, X_n$ are the independent variables
 - $B_0$ is the intercept
 - $B1, B2, B_n$ are the slopes
The goal of the algorithm is to find the best Fit Line equation that can predict the values based on the independent variables.
In regression set of records are present with X and Y values and these values are used to learn a function so if you want to predict Y from an unknown X this learned function can be used.
In regression we have to find the value of Y, So, a function is required that predicts continuous Y in the case of regression given X as independent features.

Assumptions of Multiple Linear Regression

For Multiple Linear Regression, all four of the assumptions from Simple Linear Regression apply. In addition to this, below are few more:

No multicollinearity: There is no high correlation between the independent variables. This indicates that there is little or no correlation between the independent variables. Multicollinearity occurs when two or more independent variables are highly correlated with each other, which can make it difficult to determine the individual effect of each variable on the dependent variable. If there is multicollinearity, then multiple linear regression will not be an accurate model.
Additivity: The model assumes that the effect of changes in a predictor variable on the response variable is consistent regardless of the values of the other variables. This assumption implies that there is no interaction between variables in their effects on the dependent variable.
Feature Selection: In multiple linear regression, it is essential to carefully select the independent variables that will be included in the model. Including irrelevant or redundant variables may lead to overfitting and complicate the interpretation of the model.
Overfitting: Overfitting occurs when the model fits the training data too closely, capturing noise or random fluctuations that do not represent the true underlying relationship between variables. This can lead to poor generalization performance on new, unseen data.

Multiple linear regression sometimes faces issues like multicollinearity.

Multicollinearity

Multicollinearity is a statistical phenomenon where two or more independent variables in a multiple regression model are highly correlated, making it difficult to assess the individual effects of each variable on the dependent variable.

Detecting Multicollinearity includes two techniques:

Correlation Matrix: Examining the correlation matrix among the independent variables is a common way to detect multicollinearity. High correlations (close to 1 or -1) indicate potential multicollinearity.
VIF (Variance Inflation Factor): VIF is a measure that quantifies how much the variance of an estimated regression coefficient increases if your predictors are correlated. A high VIF (typically above 10) suggests multicollinearity.

Use Case of Multiple Linear Regression re some use cases:

- Real Estate Pricing: In real estate MLR is used to predict property prices based on multiple factors such as location, size, number of bedrooms, etc. This helps buyers and sellers understand market trends and set competitive prices.
- Financial Forecasting: Financial analysts use MLR to predict stock prices or economic indicators based on multiple influencing factors such as interest rates, inflation rates and market trends. This enables better investment strategies and risk management24.
- Agricultural Yield Prediction: Farmers can use MLR to estimate crop yields based on several variables like rainfall, temperature, soil quality and fertilizer usage. This information helps in planning agricultural practices for optimal productivity
- E-commerce Sales Analysis: An e-commerce company can utilize MLR to assess how various factors such as product price, marketing promotions and seasonal trends impact sales.

Evaluation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of any linear regression model.
These assessment metrics often give an indication of how well the model is producing the observed outputs.
The most common measurements are:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the accuracy of a regression model.
MAE measures the average absolute difference between the predicted values and actual values.
Mathematically, MAE is expressed as:
$MAE=\frac{\sum |Yi-\hat{Yi}|}{n}$
- Here,
 - n is the number of observations
 - $Y_i$ represents the actual values.
 - $\hat{Y_i}$ represents the predicted values
Lower MAE value indicates better model performance.
It is not sensitive to the outliers as we consider absolute differences.

Root Mean Squared Error (RMSE)

The square root of the residuals' variance is the Root Mean Squared Error.
It describes how well the observed data points match the expected values, or the model's absolute fit to the data.
In mathematical notation, it can be expressed as:
$RMSE = \sqrt{\frac{RSS}{T}} = \sqrt{\frac{\sum_{i=1}^{n} (actual - predicted)^2}{T}}$
Rather than dividing the entire number of data points in the model by the number of degrees of freedom, one must divide the sum of the squared residuals to obtain an unbiased estimate.
Then, this figure is referred to as the Residual Standard Error (RSE).
In mathematical notation, it can be expressed as:
$RMSE = \sqrt{\frac{RSS}{(n-2)}} = \sqrt{\frac{\sum_{i=1}^{n} (actual - predicted)^2}{(n-2)}}$
RSME is not as good of a metric as R-squared.
Root Mean Squared Error can fluctuate when the units of the variables vary since its value is dependent on the variables' units (it is not a normalized measure).

CLUSTERING

The task of grouping data points based on their similarity with each other is called Clustering or Cluster Analysis.
This method is defined under the branch of Unsupervised Learning, which aims at gaining insights from unlabelled data points, that is, unlike supervised learning we don't have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset.
It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with highest similarity score together.

Types of Clustering

Broadly speaking, there are 2 types of clustering that can be performed to group similar data points.

Hard Clustering:

In this type of clustering, each data point belongs to a cluster completely or not.
For example, Let's say there are 4 data point and we have to cluster them into 2 clusters. So each data point will either belong to cluster 1 or cluster 2.

What is clustering for?

Groups people of similar sizes together to make "small", "medium" and "large" T-Shirts
- Tailor-made for each person: too expensive
- One-size-fits-all: does not fit all
In marketing, segment customers according to their similarities
- To do targeted marketing
Given a collection of text documents, we want to organize them according to their content similarities
- To produce a topic hierarchy

Aspects of Clustering

A (similarity, or distance or dissimilarity) function
Clustering quality
- Inter-clusters distance → maximized
- Intra-clusters distance → minimized
The quality of a clustering result depends on the algorithm, the distance function, and the application

What is Cluster Analysis?

Finding groups of objects in data such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Intra-cluster distances are minimized
Inter-cluster distances are maximized

Types of Clustering

A clustering is a set of clusters

Partitional Clustering

A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset

Hierarchical Clustering

A set of nested clusters organized as a hierarchical tree

Uses of Clustering

Now before we begin with types of clustering algorithms, we will go through the use cases of Clustering algorithms. Clustering algorithms are majorly used for:
- Market Segmentation - Businesses use clustering to group their customers and use targeted advertisements to attract more audience.
- Market Basket Analysis - Shop owners analyze their sales and figure out which items are majorly bought together by the customers.
- Social Network Analysis - Social media sites use your data to understand your browsing behaviour and provide you with targeted friend recommendations or content recommendations.
- Medical Imaging - Doctors use Clustering to find out diseased areas in diagnostic images like X-rays.
- Anomaly Detection - To find outliers in a stream of real-time dataset or forecasting fraudulent transactions we can use clustering to identify them.

Types of Clustering Algorithms

Centroid-based Clustering (Partitioning methods)
Density-based Clustering (Model-based methods)
Connectivity-based Clustering (Hierarchical clustering)
Distribution-based Clustering

Centroid-based Clustering (Partitioning methods)

Partitioning methods are the most easiest clustering algorithms.
They group data points on the basis of their closeness.
Generally, the similarity measure chosen for these algorithms are Euclidian distance, Manhattan Distance or Minkowski Distance.
The datasets are separated into a predetermined number of clusters, and each cluster is referenced by a vector of values.
When compared to the vector value, the input data variable shows no difference and joins the cluster.
The primary drawback for these algorithms is the requirement that we establish the number of clusters, "k," either intuitively or scientifically (using the Elbow Method) before any clustering machine learning system starts allocating the data points.
Despite this, it is still the most popular type of clustering.
K-means and K-medoids clustering are some examples of this

Module 3: Machine Learning Algorithms

CLASSIFICATION

Types of Classification

1. Binary Classification

2. Multiclass Classification

3. Multi-Label Classification

How Classification in Machine Learning Works

Steps:

Examples of Machine Learning Classification in Real Life

Classification Modeling in Machine Learning

Key characteristics:

Classification Algorithms

Linear Classifiers:

Non-linear Classifiers:

Ensemble learning classifiers:

Decision Tree in Machine Learning

Intuition behind the Decision Tree

Example: Predicting Whether a Person Likes Computer Games

Example: Predicting Whether a Person Likes Computer Games Using Two Decision Trees

Attribute Selection Measures

Information Gain

Entropy

ARTIFICIAL NEURAL NETWORKS

Neural Networks Architecture

Artificial neurons vs Biological neurons

Structure:

Synapses:

Learning:

Activation:

How do Artificial Neural Networks learn?

What are the types of Artificial Neural Networks?

Feedforward Neural Network:

Convolutional Neural Network:

Modular Neural Network:

Radial basis function Neural Network:

Recurrent Neural Network:

APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS:

REGRESSION

LINEAR REGRESSION

Why Linear Regression is Important?

What is the best Fit Line?

Hypothesis function in Linear Regression

How to update 0<em>10<em>10<em>1 and 0</em>20</em>20</em>2 values to get the best-fit line?

Types of Linear Regression

Assumptions of Simple Linear Regression

Use Case of Simple Linear Regression

Multiple Linear Regression

Assumptions of Multiple Linear Regression

Multicollinearity

Detecting Multicollinearity includes two techniques:

Use Case of Multiple Linear Regression re some use cases:

Evaluation Metrics for Linear Regression

Mean Absolute Error (MAE)

Root Mean Squared Error (RMSE)

CLUSTERING

Types of Clustering

Hard Clustering:

What is clustering for?

Aspects of Clustering

What is Cluster Analysis?

Types of Clustering

Partitional Clustering

Hierarchical Clustering

Uses of Clustering

Types of Clustering Algorithms

Centroid-based Clustering (Partitioning methods)

How to update $0<em>1$ and $0</em>2$ values to get the best-fit line?