gaussian discriminant analysis

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/8

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

9 Terms

New cards

univariate gaussian discriminant analysis

Setup (per class k)

Assume the feature x inside class k is Gaussian:

P(x∣y=k) = N (x;μ_k,σ_k²).

Prior of class k: π_k=P(y=k)

What we want for prediction. - posterior class

P(y=k∣x) By Bayes:

P(y=k∣x) ∝ P(x∣y=k)π_k

For choosing the class, the denominator cancels—so just compare the scores

score_k(x)= P(x∣y=k) π_k

Log-discriminant (same thing, easier math)

g_k(x) =logP (x∣y=k)+logπ_k

Plug the Gaussian in and drop the constant −1/2 log⁡(2π) (it’s identical for all classes):

g_k(x) = − log ⁡σ_k− (x − μ_k )² / 2 σk²_m + log ⁡π _k

Predict:

ŷ = =argkmax g_k(x)

<p><strong>Setup (per class k)</strong></p><ul><li><p>Assume the feature x <strong>inside class k</strong> is Gaussian:</p></li></ul><p>P(x∣y=k) = N (x;μ<sub>k</sub>,σ<sub>k</sub>²).</p><p>Prior of class k: π<sub>k</sub>=P(y=k)</p><p></p><p><strong>What we want for prediction. - posterior class</strong></p><ul><li><p>P(y=k∣x) By Bayes:</p></li></ul><p>P(y=k∣x) ∝ P(x∣y=k)π<sub>k</sub></p><p></p><p>For choosing the class, the denominator cancels—so just compare the <strong>scores</strong></p><p>score<sub>k</sub>(x)= P(x∣y=k) π<sub>k</sub></p><p>Log-discriminant (same thing, easier math)</p><p>g<sub>k</sub>(x) =logP (x∣y=k)+logπ<sub>k</sub></p><p></p><p>Plug the Gaussian in and drop the constant −1/2 log⁡(2π) (it’s identical for all classes):</p><p> g<sub>k</sub>(x) = − log ⁡σ<sub>k </sub>− (x − μ<sub>k</sub> )² / 2 σk²<sub>m</sub> + log ⁡π <sub>k</sub> </p><p>Predict:</p><p>ŷ = =argkmax g<sub>k</sub>(x)</p><p></p><p></p>

New cards

simplifications

GDA collapses to super simple rules under common assumptions

Simplification 1: Equal variances (σ_k=σ for all k)

The −log⁡σ_kterm is the same for every class, so it doesn’t affect which is largest

→ g_k(x) = - 1/2σ² (x- μ_k )² + logπ_k

Simplification 2: Equal priors as well (π_k same for all k)

Now log⁡π_k is also common and drops out:

→ - (x- μ_k )² → choose class w closest mean

New cards

univariate gaussian discriminant analysis summary

g_k(x) measures distance to mean. it uses the Mahalanobis distance if variances Arne’t equal. Bias is applied using class priors (higher bias for more likely classes)

process:

1) choose distribution model (Gaussian)

2) compute distribution parameters from data (σ_k, μ_k , π_k )

3) compute discriminant function g_k(x)

4) choose ŷ = argmax j g_j(x)

New cards

using a decision function

consider a 2-class case

→ 2 discriminant functions g ₁(x) g₂(x)

ŷ = argmax j g_j(x) → ŷ = 1 if y₁(x) > y₂(x) , 0 otherwise

define a decision function

d(x) = g₁(x) - g₂(x) = log (P(x | y=1)P(y=1) / P(x|y=0)P(y=0) ) log odds

ŷ = 1 if d(x) >0, 0 otherwise

<p><strong>consider a 2-class case</strong></p><p>→ 2 discriminant functions g <sub>1</sub>(x) g<sub>2</sub>(x)</p><p>ŷ = argmax j g<sub>j</sub>(x) → ŷ = 1 if y<sub>1</sub>(x) > y<sub>2</sub>(x) , 0 otherwise</p><p></p><p>define a decision function</p><p>d(x) = g<sub>1</sub>(x) - g<sub>2</sub>(x) = log (P(x | y=1)P(y=1) / P(x|y=0)P(y=0) ) log odds</p><p>ŷ = 1 if d(x) >0, 0 otherwise</p>

New cards

multivariate gaussian discriminant analysis

{ (x⁽ⁱ⁾, y⁽ⁱ⁾)}

What we assume (per class k)

Feature vector x ∈ R ⁿ

Inside class k, x is Gaussian with mean μ_k and covariance Σ_k :

(check pic)

Class prior: π_k = P ( y = k )

log-posterior up to a constant:

Take log and add the prior term:

g_k( x ) = −1/2 log∣Σ_k∣ − 1/2 (x−μ_k)^⊤ Σ_k ⁻¹ (x−μ_k)+logπ_k (+same const for all k)

then predict the label with the biggest g_k(x)

<p>{ (x<sup>(i)</sup>, y<sup>(i)</sup>)}</p><p><span>What we assume (per class k)</span></p><p>Feature vector x ∈ R <sup>n</sup></p><p>Inside class k, x is Gaussian with mean μ<sub> k</sub> and covariance Σ<sub>k</sub> : </p><ul><li><p>(check pic)</p></li></ul><p>Class prior: π<sub> k</sub> = P ( y = k ) </p><p></p><p><span>log-posterior up to a constant:</span></p><p><span>Take log and add the prior term:</span></p><p>g<sub>k</sub>( x ) = −1/2 log∣Σ<sub>k</sub>∣ − 1/2 (x−μ<sub>k</sub>)<sup>⊤</sup> Σ<sub>k</sub> <sup>−1</sup> (x−μ<sub>k</sub>)+logπ<sub>k</sub> (+same const for all k) </p><p><span>then predict the label with the biggest g<sub>k</sub>(x)</span></p>

New cards

cases

General case ⇒ quadratic rule (QDA)

Expand the square:

g_k(x) =x^⊤A_kx + x^⊤b_k+c_k

with (see pics)

The difference between two classes a,b:

d_ab(x)=g_a(x)−g_b(x)= x^⊤A*x+x^⊤b+ c*

Because of the x^⊤(⋅)x term, the boundary d_ab(x)=0 is quadratic (curved).
That’s why your left plot says “quadratic” and shows a bent boundary.

Special case: equal covariances (Σ_k=Σ for all k) ⇒ linear rule (LDA)

The quadratic terms cancel (A_a=A_b), so

g_k(x)=x^⊤b_k+c_k,
b_k=Σ⁻¹μ_k, c_k=−1/2 μ_k^⊤Σ⁻¹μ_k+logπ_k

Decision boundary between two classes is a hyperplane (a straight line in 2D)

special case 2: equal covariances and equal priors

Then log⁡π_k is the same for all k and drops out.

gk(x)= || x - μ_k|| ² ← euclidean distance to mean

How to read your plots

Linear panel: same Σ for both classes → straight boundary.
Quadratic panel: different Σ_k (e.g., one class more elongated/tilted) → boundary bends; you can even get multiple disjoint regions in 1D.

TL;DR