wk3 - minimising risk and linear classifiers

0.0(0)
studied byStudied by 0 people
0.0(0)
call with kaiCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

40 Terms

1
New cards

decision boundary

the line (or lines) separating classes

the boundary is a property of the classifier

i.e. different types of classifier will have different boundaries even when trained on the same data

2
New cards

what is the boundary of a N-D feature space formed of

N - 1 dimensional 'hypersurfaces'

3
New cards

finding the decision boundary

in many cases the decision boundary is when P(x1 | x) = p(w2 | x)

and if the priors are equal they will cancel out leaving

p(x = x0 | ω1) = p(x = x0 | ω2)

4
New cards

how to find the probability of an error

an error will occur when the pdfs overlap. we can define this by:

calculating the area of each past the decision boundary then add them together

5
New cards

what is loss

this is the cost of misclassifying something from class i as belonging to class j

6
New cards

how is average risk defined in a two class problem?

r = λ21​P(x∈R1​,ω2​)+λ12​P(x∈R2​,ω1​)

note: when the loss is equal, the risk is the same as the probability of an error. so in default case, minimising error also minimises the risk

7
New cards

how is average risk minimised

by selecting partitioning regions so thar it is the class with the lowest loss

for all of x in Ri

if li(x) < lj(x) for all j =/= i

8
New cards

in a two class classification problem, what are the two types of errors?

1. classifying an object from class w2, as belonging to w1

2. classifying an object from w1 as w2

9
New cards

what does λ₁₂ represent in classification

the loss (cost) of misclassifying something from class ω₁ as belonging to class ω₂

10
New cards

for the case of λ₁₁ = λ₂₂ = 0, λ₁₂ = λ₂₁ = 1

what does the risk minimisation rule simplify to

choose class ω₁ if

p(x∣ω2​)P(ω2​) < p(x∣ω1​)P(ω1​)

11
New cards

what is a feature vector and where does it exist?

a feature vector contains all features for a sample

it exists in an L-dimensional feature space where each sample is a point

12
New cards

what two components define a vector

magnitude (length) and direction

13
New cards

what is the inner/dot product of two vectors?

measures the magnitude of the projection of one vector onto another

multiply corresponding components and add them all together

the result is a single scalar number

14
New cards

what does the dot product measure geometrically

how much one vector points in the direction of another

15
New cards

how do you find the projection of vector x onto vector y

you divide the dot product of x and y, by the dot product of y with itself. then multiply by y

16
New cards

what is the outer product of two vectors

multiply each components of the first vector by each component of the second vector. this produces a matrix

17
New cards

if A is MxL, and B is Lx1, what is the size of the result of AB

Mx1 vector (M rows, 1 column)

18
New cards

what is the lp-norm distance between two vectors x and y?

sum of absolute differences raised to the power p, then take the p-th root of the sum

19
New cards

what is Euclidean distance?

this is the l2-norm distance, where p=2.

it is the square root of the sum of the squared distances.

20
New cards

what is Manhattan distance?

this is the l1-norm distance,

where p=1

it is the sum of the absolute differences

21
New cards

if you see ||x - y||, without a subscript, which norm is usually meant?

l2-norm, p=2

euclidean distance

22
New cards

what is the key difference between manhattan and euclidean distance visually

manhattan measures 'grid' distance, euclidean measures straight line distance

23
New cards

what is cosine similarity between two vectors

cosine of angle between them = (dot product) / (product of lengths)

24
New cards

how is the cosine distance different from cosine similarity?

cosine distance = 1 - cosine similarity

increases as vectors become less similar

25
New cards

why use cosine distance instead of euclidean?

cosine distance is scale-invariant. it only considers angle, not magnitude.

26
New cards

what are two types of proximity measures

1. dissimilarity measures - larger value = further apart

2. similarity measures - larger value = closer

27
New cards

when is a function a valid dissimilarity measure

if

1. it returns the same d0 when measuring similarity between point a and itself

d(x, x) = d0, ∀x ∈ X

2. dissimilarity between two points is never less than d0

d(x, y) ≥ d0, ∀x, y ∈ X

28
New cards

when is a function a valid similarity measure

if

1. it returns the same s0 value when measuring similarity between a point and itself

s(x, x) = s0 ∀x ∈ X

2. similarity between two points is never greater than s0

s(x, y) ≤ s0 ∀x, y ∈ X

29
New cards

core idea of one-dimensional gaussian probability

the probability decays exponentially with the squared distance from the mean (x−μ)^2

30
New cards

to extend the gaussian to L dimensions, what distance measure is naturally used?

the euclidean distance

31
New cards

main limitation of using simple euclidean distance for a multivariate gaussian

ignores any covariance (relationships) between features

it treats all dimensions as independent and equally scaled

32
New cards

what distance measure is used to account for feature scaling and correlation

the mahalanobis distance

33
New cards

mahalanobis distance

distance between vectors is the units of covariance

it neasures how many standard devations apart the points are, accounting for correlations

34
New cards

what are the two parameters of a univariate gaussian

mean

variance

35
New cards

linear classifier

simply a classifier that can only generate linear decision boundaries

36
New cards

for a gaussian classifier, what must we estimate for each class from the training data

the class mean vector and covariance matrix

37
New cards

in a discriminative classifier, what can we directly use as a discriminant function gi(x)?

the posterior probability

P(ωi​∣x) itself

38
New cards

how can we find the decision boundary between two classes wi and wj using discriminant functions gi(x)?

solve the equation

gi(x) = gj(x) = 0

where

gi(x) ≡ f(P(wi | x))

39
New cards

for bayesian classifiers, what specific form of the discriminant function gi(x) do we normally use

gi(x) = ln P(wi | x)

40
New cards

under what condition would a gaussian classifier produce a linear decision boundary

when all classes share the same covariance matrix

then the quadratic terms cancel, leaving a linear function

Explore top flashcards