4-2: algorithms, kNN, SVM, naive bayes

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/16

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

17 Terms

New cards

kNN (k nearest neighbors)

simple, effective algorithm based on the idea that similar things are near each other
K refers to the number of nearest neighbors
classification: look at it and use majority voting
regression: take the average of K neighbors values

New cards

kNN steps

calculate distance
find and rank neighbors
vote for class

New cards

parameters in ML

variables that are learned from the data during the training process
internal to the model and their values are estimated from the training data

New cards

hyperparameters of ML

the configuration variables that are external to the model and whose values cannot be estimated from the data
set before the learning process begins and control the learning process itself

New cards

advantages of kNN

simple to understand and implement
no assumptions about data distribution
can be used for classification and regression

New cards

disadvantages of kNN

computationally expensive for large datasets
sensitive to irrelevant features and the scale of the data
does not work well with high dimensional data

New cards

SVM (support vector machine)

to find a hyperplane in an N-dimensional space (N = number of features) that distinctly classifies the data points
samples on the margin are called the support vectors

New cards

plane

in SVM, hyperplane is a ____

New cards

model fitting for SVM

maximum margin hyperplane and margins are trained with samples from 2 classes

New cards

kernal trick in SVM for non-linear and high-dimensional data

functions that transform the input data space into a higher-dimensional space where it becomes easier to separate the classes linearly
allow model to crease non-linear decision boundaries in the original input space
allows this transformation to be done efficiently without explicitly computing the coordinates in the higher-dimensional space

New cards

advantages of SVM

effective in high-dimensional spaces
memory efficient
works well even with a small number of samples
good generalization capabilities which prevent it from overfitting
can efficiently handle non-linear data
a small change to the data does not greatly affect the hyperplane

New cards

disadvantages of SVM

choosing the right kernal and tuning parameters can be challenging
not directly probabilistic (though extensions exist)
can be computationally intensive for large datasets

New cards

kNN vs SVM

often more appropriate for simpler, intuitive tasks where the concept of ‘similarity‘ is straightforward and the dataset is not too large or high-dimensional
useful when interpretability is a priority

New cards

SVM vs kNN

mre suitable for complex, high-dimensional problems where the relationship between features and outcomes may be non-linear and where achieving the highest possible predictive accuracy is crucial

New cards

naive bayes classifier

called this because it assumes that the occurrence of a certain feature is independent of the occurrence of other features
each future individually contributes to identify that it is an apple without depending on each other
depends on bayes’ theorem

New cards

advantages of naive bayes