4-2: algorithms, kNN, SVM, naive bayes

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/16

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

17 Terms

1
New cards

kNN (k nearest neighbors)

  • simple, effective algorithm based on the idea that similar things are near each other

  • K refers to the number of nearest neighbors 

  • classification: look at it and use majority voting

  • regression: take the average of K neighbors values

2
New cards

kNN steps

  1. calculate distance

  2. find and rank neighbors

  3. vote for class

3
New cards

parameters in ML

  • variables that are learned from the data during the training process

  • internal to the model and their values are estimated from the training data

4
New cards

hyperparameters of ML

  • the configuration variables that are external to the model and whose values cannot be estimated from the data

  • set before the learning process begins and control the learning process itself

5
New cards

advantages of kNN

  • simple to understand and implement

  • no assumptions about data distribution

  • can be used for classification and regression

6
New cards

disadvantages of kNN

  • computationally expensive for large datasets

  • sensitive to irrelevant features and the scale of the data

  • does not work well with high dimensional data

7
New cards

SVM (support vector machine)

  • to find a hyperplane in an N-dimensional space (N = number of features) that distinctly classifies the data points

  • samples on the margin are called the support vectors

8
New cards

plane

in SVM, hyperplane is a ____

9
New cards

model fitting for SVM

maximum margin hyperplane and margins are trained with samples from 2 classes

10
New cards

kernal trick in SVM for non-linear and high-dimensional data

  • functions that transform the input data space into a higher-dimensional space where it becomes easier to separate the classes linearly

  • allow model to crease non-linear decision boundaries in the original input space

  • allows this transformation to be done efficiently without explicitly computing the coordinates in the higher-dimensional space

11
New cards

advantages of SVM

  • effective in high-dimensional spaces

  • memory efficient

  • works well even with a small number of samples

  • good generalization capabilities which prevent it from overfitting

  • can efficiently handle non-linear data

  • a small change to the data does not greatly affect the hyperplane

12
New cards

disadvantages of SVM

  • choosing the right kernal and tuning parameters can be challenging

  • not directly probabilistic (though extensions exist)

  • can be computationally intensive for large datasets

13
New cards

kNN vs SVM

  • often more appropriate for simpler, intuitive tasks where the concept of ‘similarity‘ is straightforward and the dataset is not too large or high-dimensional

  • useful when interpretability is a priority

14
New cards

SVM vs kNN

mre suitable for complex, high-dimensional problems where the relationship between features and outcomes may be non-linear and where achieving the highest possible predictive accuracy is crucial

15
New cards

naive bayes classifier

  • called this because it assumes that the occurrence of a certain feature is independent of the occurrence of other features

  • each future individually contributes to identify that it is an apple without depending on each other

  • depends on bayes’ theorem

16
New cards

advantages of naive bayes

  • simple and fast

  • easily interpretable results

  • well with high-dimensional data

  • binary or multi-class classifications

  • performs well even with small training datasets

17
New cards

disadvantages of naive bayes

  • assumes feature independence (not true in real-world scenarios)

  • can be outperformed by more sophisticated models on complex tasks