1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
kNN (k nearest neighbors)
simple, effective algorithm based on the idea that similar things are near each other
K refers to the number of nearest neighbors
classification: look at it and use majority voting
regression: take the average of K neighbors values
kNN steps
calculate distance
find and rank neighbors
vote for class
parameters in ML
variables that are learned from the data during the training process
internal to the model and their values are estimated from the training data
hyperparameters of ML
the configuration variables that are external to the model and whose values cannot be estimated from the data
set before the learning process begins and control the learning process itself
advantages of kNN
simple to understand and implement
no assumptions about data distribution
can be used for classification and regression
disadvantages of kNN
computationally expensive for large datasets
sensitive to irrelevant features and the scale of the data
does not work well with high dimensional data
SVM (support vector machine)
to find a hyperplane in an N-dimensional space (N = number of features) that distinctly classifies the data points
samples on the margin are called the support vectors
plane
in SVM, hyperplane is a ____
model fitting for SVM
maximum margin hyperplane and margins are trained with samples from 2 classes
kernal trick in SVM for non-linear and high-dimensional data
functions that transform the input data space into a higher-dimensional space where it becomes easier to separate the classes linearly
allow model to crease non-linear decision boundaries in the original input space
allows this transformation to be done efficiently without explicitly computing the coordinates in the higher-dimensional space
advantages of SVM
effective in high-dimensional spaces
memory efficient
works well even with a small number of samples
good generalization capabilities which prevent it from overfitting
can efficiently handle non-linear data
a small change to the data does not greatly affect the hyperplane
disadvantages of SVM
choosing the right kernal and tuning parameters can be challenging
not directly probabilistic (though extensions exist)
can be computationally intensive for large datasets
kNN vs SVM
often more appropriate for simpler, intuitive tasks where the concept of ‘similarity‘ is straightforward and the dataset is not too large or high-dimensional
useful when interpretability is a priority
SVM vs kNN
mre suitable for complex, high-dimensional problems where the relationship between features and outcomes may be non-linear and where achieving the highest possible predictive accuracy is crucial
naive bayes classifier
called this because it assumes that the occurrence of a certain feature is independent of the occurrence of other features
each future individually contributes to identify that it is an apple without depending on each other
depends on bayes’ theorem
advantages of naive bayes
simple and fast
easily interpretable results
well with high-dimensional data
binary or multi-class classifications
performs well even with small training datasets
disadvantages of naive bayes
assumes feature independence (not true in real-world scenarios)
can be outperformed by more sophisticated models on complex tasks