1/68
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
KNN Algorithm Type
Instance-based
Euclidean Distance Formula
d = sqr(sum(xi-yi)^2))
Manhattan Distance Formula
| x1 - x2 | + | y1 - y2 |
Minkowski Distance Formula
d = (sum(|xi-yi|^p)^(1/p)
KKN Steps
- Compute distance to all points
- Sort distances
- Select k nearest
- Majority vote based on k nearest
What is K in KNN?
Number of neighbors used for voting
Effect of small K in KKN
High Variance & Overfitting
Effect of large K in KKN
High bias & Underfitting
What is KNN's main flaw?
The curse of dimensionality
Curse of Dimensionality
High-dimensional data requires more samples
Normalization Formula
(x - min) / (max - min)
Cosine Similarity Formula
cos(theta) = (x * y)/||x||||y||
Jaccard Similarity Formula
J(A,B) = | A intersection B |/| A union B |
How to calculate Hamming Distance
Count the number of differing positions
In what type of sets will KNN perform poorly?
- High dimensional Data
- Large dataset
- Sets with unscaled features
What is the purpose of Support Vector Machines
Finding the best separating hyperplane / margin between classification classes in a model
SVM Decision Boundary Formula
(w^T)x + b = 0
What is the margin in SVM?
The distance between the boundary and closest points
SVM Margin Formula
2/||w||
What are the 3 SVM Optimization Formulas
- Hard Margin
- Soft Margin
- Hinge Loss
Hard Margin Formula
min(1/2)||w||^2
Soft Margin Formula
(min(1/2)∣∣w∣∣^2) + C∑ξi
Hinge Loss Formula
L = max(0,1 − y((w^T)x + b))
What does the parameter C control in SVM?
Margin vs misclassification tradeoff
What affects can you expect in SVM when parameter C is small?
- Largin margin
- Large amount of errors
What affects can you expect in SVM when parameter C is large?
- Small margin
- Few errors
SVM Kernel Trick Purpose
Transforms data to higher dimensions
SVM Kernel Trick Formula
K(x,z) = ϕ(x) ⋅ ϕ(z)
Guassian RBF Formula
K(x,l) = exp(−γ∣∣x−l∣∣^2)
What is the expected outcome of a large Gamma(Y) In Guassian RBF?
Overfitting & Wiggly Boundary
What is the expected outcome of a small Gamma(Y) In Guassian RBF?
Underfitting & Smooth boundaries
What kind of algorithm is Naive Bayes
Probabilistic Classifier
Steps of Naive Bayes
- Compute prior P(c)
- Compute likelihoods
- Multiply
- Choose max
Gaussian NB Formula
P(x∣c) = (1/sqr(2πσ^2))e^(-(x-y)^2/2σ^2
What are the assumptions when using Gaussian NB?
- Independent Features
- Gaussian for continuous
What type of algorithm is SOFTMAX Regression?
Multi-Class Classification
What algorithm is SOFTMAX Regression a version of?
Logistic Regression
What does SOFTMAX produce?
- Outputs from range sum to 1
- Probabilities
What is the assumption when using SOFTMAX?
Classes are mutually exclusive
SOFTMAX Net Input Z Formula
Z = XW + b
Cross-Entropy Formula
-sum(ylog(yhat))
How to calculate total parameters in SOFTMAX Regression?
(features * classes) + classes
What are odd K's used in KNN
To avoid ties
What is the most common distance formula in KNN?
Euclidean
In KNN what happens if one feature has much larger values than others?
The large value dominates the distance
What technique in KNN is used to reduce dimensionality?
PCA
What is the angle-based distance metric in KNN?
Cosine
What property is not required for Minkowski Distance metric?
Linearity
In the Minkowski Distance formula what value give Manhattan Distance?
1
In the Minkowski Distance formula what value give Euclidean Distance?
2
What are support vectors ins SVM?
Closest points to boundary that determine the hyperplane
What does hinge loss penalize?
Points that are inside margin or have been misclassified
What happens if Naive Bayes encounters a feature value not seen in training?
Probability becomes zero
Laplace Smoothing
Technique to handle zero probabilities in classification
What assumption is made in Naive Bayes
Features are independent
What happens if Naive Bayes assumption is violated?
Accuracy may decrease
In softmax, what happens if one logit is much larger?
That class gets probability ≈ 1
What models are most sensitive to unscaled features?
- KNN
- SVM
What happens when features are not scaled in SVM?
Margin becomes skewed
What kernel maps to infinite-dimensional space
RBF
What happens if a probability becomes zero in Naive Bayes?
Entire product becomes zero
What does Naive Bayes produce?
Multiple probabilities of classes
What is the distribution assumption of GNB?
Normal Distribution
What are the required parameters of GNB
- mean
- variance
What type of features is GNB used on?
Continuous Features
NB Normalized Term
P(X)
NB Prior Class
P(Y)
NB Likelihood
P(X|Y)
NB Posterior
P(Y|X)