Support Vector Machines (SVM)

SVM is a machine learning algorithm used for classification.
The goal is to find the best line (or hyperplane in higher dimensions) that separates different classes of data points.
The "best" line is the one that maximizes the margin, which is the widest separation between the classes.

Margin: The widest region that separates two classes. A larger margin indicates a better separation.
Support Vectors: The data points that lie on the boundary of the margin. These points are crucial for defining the separating hyperplane. Once these vectors are determined, other data points can be disregarded.
Optimization Problem: Training an SVM involves finding the optimal hyperplane by comparing points from different classes to determine the shortest distance between them.
The distance calculation involves a dot product between the points.

The goal is to find the widest margin that separates classes.
Margin is the biggest distance.
If data isn't perfect, outliers can cause issues.
Strict margin definitions can lead to misclassification of new observations due to outliers.

Allows for misclassifications during training to improve generalization.
Trades off a wider margin by allowing some errors in the training data.
Regularization: A technique used to control the number of misclassifications allowed. This helps balance training accuracy with generalization performance.

A parameter that allows for misclassifications in the training data.
Balances being good in training vs. penalizing errors to improve generalization.

Non-Linear Transformations: Applying a transformation to data points to make them linearly separable in a higher-dimensional space.
If data is not linearly separable in its original space, it can be transformed into a higher-dimensional space where it is separable.
Examples of transformations:
- Mapping a 1D dataset to 2D using x_1^2.
- Using the modulo operator (mod) to separate points.
The implication of finding boundaries is that it helps identify the most accurate classifications.

Transform data to a higher dimension where it becomes linearly separable
Challenge: Finding the right transformation function and performing the transformation can be computationally expensive, especially with high-dimensional data (e.g., images).

A method to compute the dot product (relationship) between points in the transformed space without explicitly transforming the points.
Avoids the computational cost of direct transformation.
Kernel functions take vectors from the original dimension and provide the dot product of these vectors in the feature space.
K(vi, vj) gives the relationship (dot product) between vectors vi and vj in the transformed (feature) space.

Avoids nasty calculations required for point-by-point transformation.
If the dataset is small enough, it is possible to transform the data and apply a linear SVM.

Pair by pair comparison is done during the optimization to find the maximum margin.

SVC: Support Vector Classification (used for classification problems).
SVR: Support Vector Regression (used for regression problems).
Common Kernel Functions:
- Linear: For linearly separable data.
- RBF (Radial Basis Function): Commonly used in neural networks as an activation function.
- Polynomial.

Controls the complexity of the decision boundary.
A higher gamma leads to a more complex, rugged boundary, which can increase the model's complexity.
Finding the right gamma involves experimentation using cross-validation.

Creating different SVMs with different kernel functions to create different regions.

SVMs use a kernel trick to find relationships in high-dimensional spaces without transforming data.
Find margins and boundaries that separate classes.
Important parameters to keep in mind:
- Regularization: Allows for misclassification, avoids overfitting, and improves generalization.
- Gamma: Controls the complexity of decision boundaries.