1/22
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
hard margin objective
Minimize one half of the squared L2 norm of w; this maximizes the margin by keeping w small.
hard margin constraint
For every training example i, y sub i times w transpose x sub i is at least one; this enforces correct classification with a margin.
soft margin objective
Minimize one half of the squared L2 norm of w plus C divided by N times the sum over i of xi sub i; this trades off margin size and violations.
soft margin constraint
For every training example i, y sub i times w transpose x sub i is at least one minus xi sub i, and xi sub i is at least zero; this allows controlled margin violations.
hinge loss
Max of zero and one minus y sub i times w transpose x sub i; penalizes points inside the margin or misclassified.
SVM loss
Minimize one half of the squared L2 norm of w plus C divided by N times the sum over i of max of zero and one minus y sub i times w transpose x sub i; this is regularized hinge-loss minimization.
logistic loss
Log of one plus e to the negative y times w transpose x; a smooth alternative loss commonly used for probabilistic classification.
kernel function
Kernel of x and z equals the inner product of phi of x and phi of z; computes similarity in feature space without explicitly computing phi.
Gaussian kernel
Kernel of x and z equals e to the negative squared Euclidean distance between x and z divided by two sigma squared; emphasizes nearby points (radial basis behavior).
polynomial kernel
Kernel of x and z equals gamma times x transpose z plus constant c, all raised to the degree d; captures interactions up to the chosen degree.
dual solution
w equals the sum over i of alpha sub i times y sub i times x sub i (for linear SVM after solving the dual problem); only points with nonzero alpha contribute.
KNN classification formula
Prediction at x equals the sign of the sum over i in N sub k of x of y sub i; majority vote among the k nearest neighbors.
KNN regression formula
Prediction at x equals one over k times the sum over i in N sub k of x of y sub i; average of the k nearest neighbor outputs.
Euclidean distance
Distance between x and z equals the square root of the sum over coordinate j of (x sub j minus z sub j) squared; straight-line distance in d dimensions.
Jaccard distance
Distance between sets A and B equals one minus (size of A intersection B divided by size of A union B); measures dissimilarity based on overlap.
Gini impurity
One minus the sum over classes c of p sub c squared; higher values mean the node’s labels are more mixed.
Bayes rule
Probability of B given A equals probability of A given B times probability of B divided by probability of A (assuming probability of A is not zero); connects inverse conditionals.
Naive Bayes formula
Probability of class y given features x is proportional to probability of y times the product over features i of probability of x sub i given y; for classification, the denominator probability of x is omitted because it is constant across y.
K-means objective
Minimize the sum over data points i of the minimum over clusters k of the squared Euclidean distance between x sub i and mu sub k; this minimizes within-cluster sum of squares.
L2 norm
The L2 norm of w equals the square root of the sum over components j of w sub j squared; it measures vector length.
margin violation
A point violates the margin if y sub i times w transpose x sub i is less than one; equivalently, its hinge loss is positive.
eigenvector
A vector v such that A times v equals lambda times v; it is a direction preserved by the linear transformation up to scaling.
logistic function
One over one plus e to the negative input; maps any real number to a value between zero and one (often interpreted as a probability).