1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
decision boundary
the line (or lines) separating classes
the boundary is a property of the classifier
i.e. different types of classifier will have different boundaries even when trained on the same data
what is the boundary of a N-D feature space formed of
N - 1 dimensional 'hypersurfaces'
finding the decision boundary
in many cases the decision boundary is when P(x1 | x) = p(w2 | x)
and if the priors are equal they will cancel out leaving
p(x = x0 | ω1) = p(x = x0 | ω2)
how to find the probability of an error
an error will occur when the pdfs overlap. we can define this by:
calculating the area of each past the decision boundary then add them together
what is loss
this is the cost of misclassifying something from class i as belonging to class j
how is average risk defined in a two class problem?
r = λ21P(x∈R1,ω2)+λ12P(x∈R2,ω1)
note: when the loss is equal, the risk is the same as the probability of an error. so in default case, minimising error also minimises the risk
how is average risk minimised
by selecting partitioning regions so thar it is the class with the lowest loss
for all of x in Ri
if li(x) < lj(x) for all j =/= i
in a two class classification problem, what are the two types of errors?
1. classifying an object from class w2, as belonging to w1
2. classifying an object from w1 as w2
what does λ₁₂ represent in classification
the loss (cost) of misclassifying something from class ω₁ as belonging to class ω₂
for the case of λ₁₁ = λ₂₂ = 0, λ₁₂ = λ₂₁ = 1
what does the risk minimisation rule simplify to
choose class ω₁ if
p(x∣ω2)P(ω2) < p(x∣ω1)P(ω1)
what is a feature vector and where does it exist?
a feature vector contains all features for a sample
it exists in an L-dimensional feature space where each sample is a point
what two components define a vector
magnitude (length) and direction
what is the inner/dot product of two vectors?
measures the magnitude of the projection of one vector onto another
multiply corresponding components and add them all together
the result is a single scalar number
what does the dot product measure geometrically
how much one vector points in the direction of another
how do you find the projection of vector x onto vector y
you divide the dot product of x and y, by the dot product of y with itself. then multiply by y
what is the outer product of two vectors
multiply each components of the first vector by each component of the second vector. this produces a matrix
if A is MxL, and B is Lx1, what is the size of the result of AB
Mx1 vector (M rows, 1 column)
what is the lp-norm distance between two vectors x and y?
sum of absolute differences raised to the power p, then take the p-th root of the sum
what is Euclidean distance?
this is the l2-norm distance, where p=2.
it is the square root of the sum of the squared distances.
what is Manhattan distance?
this is the l1-norm distance,
where p=1
it is the sum of the absolute differences
if you see ||x - y||, without a subscript, which norm is usually meant?
l2-norm, p=2
euclidean distance
what is the key difference between manhattan and euclidean distance visually
manhattan measures 'grid' distance, euclidean measures straight line distance
what is cosine similarity between two vectors
cosine of angle between them = (dot product) / (product of lengths)
how is the cosine distance different from cosine similarity?
cosine distance = 1 - cosine similarity
increases as vectors become less similar
why use cosine distance instead of euclidean?
cosine distance is scale-invariant. it only considers angle, not magnitude.
what are two types of proximity measures
1. dissimilarity measures - larger value = further apart
2. similarity measures - larger value = closer
when is a function a valid dissimilarity measure
if
1. it returns the same d0 when measuring similarity between point a and itself
d(x, x) = d0, ∀x ∈ X
2. dissimilarity between two points is never less than d0
d(x, y) ≥ d0, ∀x, y ∈ X
when is a function a valid similarity measure
if
1. it returns the same s0 value when measuring similarity between a point and itself
s(x, x) = s0 ∀x ∈ X
2. similarity between two points is never greater than s0
s(x, y) ≤ s0 ∀x, y ∈ X
core idea of one-dimensional gaussian probability
the probability decays exponentially with the squared distance from the mean (x−μ)^2
to extend the gaussian to L dimensions, what distance measure is naturally used?
the euclidean distance
main limitation of using simple euclidean distance for a multivariate gaussian
ignores any covariance (relationships) between features
it treats all dimensions as independent and equally scaled
what distance measure is used to account for feature scaling and correlation
the mahalanobis distance
mahalanobis distance
distance between vectors is the units of covariance
it neasures how many standard devations apart the points are, accounting for correlations
what are the two parameters of a univariate gaussian
mean
variance
linear classifier
simply a classifier that can only generate linear decision boundaries
for a gaussian classifier, what must we estimate for each class from the training data
the class mean vector and covariance matrix
in a discriminative classifier, what can we directly use as a discriminant function gi(x)?
the posterior probability
P(ωi∣x) itself
how can we find the decision boundary between two classes wi and wj using discriminant functions gi(x)?
solve the equation
gi(x) = gj(x) = 0
where
gi(x) ≡ f(P(wi | x))
for bayesian classifiers, what specific form of the discriminant function gi(x) do we normally use
gi(x) = ln P(wi | x)
under what condition would a gaussian classifier produce a linear decision boundary
when all classes share the same covariance matrix
then the quadratic terms cancel, leaving a linear function