gaussian discriminant analysis

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/8

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

9 Terms

1
New cards
<p>univariate gaussian discriminant analysis</p>

univariate gaussian discriminant analysis

Setup (per class k)

  • Assume the feature x inside class k is Gaussian:

P(x∣y=k) = N (x;μk​k²).

Prior of class k: πk=P(y=k)

What we want for prediction. - posterior class

  • P(y=k∣x) By Bayes:

P(y=k∣x) ∝ P(x∣y=k)πk​

For choosing the class, the denominator cancels—so just compare the scores

scorek​(x)= P(x∣y=k) πk​

Log-discriminant (same thing, easier math)

gk​(x) =logP (x∣y=k)+logπk​

Plug the Gaussian in and drop the constant −1/2 log⁡(2π) (it’s identical for all classes):

gk​(x) = − log ⁡σk − (x − μk )² / 2 σk²m + log ⁡π k

Predict:

ŷ = =argkmax​ gk​(x)

<p><strong>Setup (per class k)</strong></p><ul><li><p>Assume the feature x <strong>inside class k</strong> is Gaussian:</p></li></ul><p>P(x∣y=k) = N (x;μ<sub>k​</sub>,σ<sub>k</sub>²).</p><p>Prior of class k: π<sub>k</sub>=P(y=k)</p><p></p><p><strong>What we want for prediction. - posterior class</strong></p><ul><li><p>P(y=k∣x) By Bayes:</p></li></ul><p>P(y=k∣x) ∝ P(x∣y=k)π<sub>k​</sub></p><p></p><p>For choosing the class, the denominator cancels—so just compare the <strong>scores</strong></p><p>score<sub>k​</sub>(x)= P(x∣y=k) π<sub>k​</sub></p><p>Log-discriminant (same thing, easier math)</p><p>g<sub>k​</sub>(x) =logP (x∣y=k)+logπ<sub>k​</sub></p><p></p><p>Plug the Gaussian in and drop the constant −1/2 log⁡(2π) (it’s identical for all classes):</p><p> g<sub>k​</sub>(x) = − log ⁡σ<sub>k </sub>− (x − μ<sub>k</sub> )² / 2 σk²<sub>m</sub> + log ⁡π <sub>k</sub>   </p><p>Predict:</p><p>ŷ = =argkmax​ g<sub>k​</sub>(x)</p><p></p><p></p>
2
New cards
<p>simplifications</p>

simplifications

GDA collapses to super simple rules under common assumptions

Simplification 1: Equal variancesk=σ for all k)

The −log⁡σk term is the same for every class, so it doesn’t affect which is largest

→ gk(x) = - 1/2σ² (x- μk )² + logπk

Simplification 2: Equal priors as well (πk same for all k)

Now log⁡πk is also common and drops out:

→ - (x- μk )² → choose class w closest mean

3
New cards
<p>univariate gaussian discriminant analysis summary</p>

univariate gaussian discriminant analysis summary

gk(x) measures distance to mean. it uses the Mahalanobis distance if variances Arne’t equal. Bias is applied using class priors (higher bias for more likely classes)

process:

1) choose distribution model (Gaussian)

2) compute distribution parameters from data (σk, μk , πk )

3) compute discriminant function gk(x)

4) choose ŷ = argmax j gj(x)

4
New cards
<p>using a decision function</p>

using a decision function

consider a 2-class case

→ 2 discriminant functions g 1(x) g2(x)

ŷ = argmax j gj(x) → ŷ = 1 if y1(x) > y2(x) , 0 otherwise

define a decision function

d(x) = g1(x) - g2(x) = log (P(x | y=1)P(y=1) / P(x|y=0)P(y=0) ) log odds

ŷ = 1 if d(x) >0, 0 otherwise

<p><strong>consider a 2-class case</strong></p><p>→ 2 discriminant functions g <sub>1</sub>(x) g<sub>2</sub>(x)</p><p>ŷ = argmax j  g<sub>j</sub>(x)   → ŷ = 1 if y<sub>1</sub>(x) &gt; y<sub>2</sub>(x) , 0 otherwise</p><p></p><p>define a decision function</p><p>d(x) = g<sub>1</sub>(x)  - g<sub>2</sub>(x)  = log (P(x | y=1)P(y=1) / P(x|y=0)P(y=0) )  log odds</p><p>ŷ = 1 if d(x) &gt;0, 0 otherwise</p>
5
New cards
<p>multivariate gaussian discriminant analysis</p>

multivariate gaussian discriminant analysis

{ (x(i), y(i))}

What we assume (per class k)

Feature vector x ∈ R n

Inside class k, x is Gaussian with mean μ k and covariance Σk :

  • (check pic)

Class prior: π k = P ( y = k )

log-posterior up to a constant:

Take log and add the prior term:

gk( x ) = −1/2 ​ log∣Σk​∣ − 1/2 (x−μk) Σk −1 ​ (x−μk)+logπk ​ (+same const for all k)

then predict the label with the biggest gk(x)

<p>{ (x<sup>(i)</sup>, y<sup>(i)</sup>)}</p><p><span>What we assume (per class k)</span></p><p>Feature vector  x ∈ R <sup>n</sup></p><p>Inside class  k,  x is Gaussian with mean  μ<sub> k</sub> and covariance  Σ<sub>k</sub> : </p><ul><li><p>(check pic)</p></li></ul><p>Class prior:  π<sub> k</sub> = P ( y = k ) </p><p></p><p><span>log-posterior up to a constant:</span></p><p><span>Take log and add the prior term:</span></p><p>g<sub>k</sub>( x ) = −1/2 ​  log∣Σ<sub>k</sub>​∣ −  1/2 (x−μ<sub>k</sub>)<sup>⊤</sup> Σ<sub>k</sub> <sup>−1</sup> ​ (x−μ<sub>k</sub>)+logπ<sub>k</sub> ​   (+same const for all k)  </p><p><span>then predict the label with the biggest g<sub>k</sub>(x)</span></p>
6
New cards
<p>cases</p>

cases

General case ⇒ quadratic rule (QDA)

Expand the square:

  • gk​(x) =xAk​x + xbk​​+ck​

with (see pics)

The difference between two classes a,b:

dab​(x)=ga​(x)−gb​(x)= xA*x+xb+ c*

Because of the x(⋅)x term, the boundary dab​(x)=0 is quadratic (curved).
That’s why your left plot says “quadratic” and shows a bent boundary.

Special case: equal covariancesk=Σ for all k) ⇒ linear rule (LDA)

The quadratic terms cancel (Aa=Ab), so

  • gk(x)=xbk+ck,

  • bk−1μk,    ck​=−1/2 ​μk​Σ−1μk+logπk

Decision boundary between two classes is a hyperplane (a straight line in 2D)

special case 2: equal covariances and equal priors

Then log⁡πk is the same for all k and drops out.

  • gk(x)= || x - μk || ²   ← euclidean distance to mean

How to read your plots

  • Linear panel: same Σ for both classes → straight boundary.

  • Quadratic panel: different Σk (e.g., one class more elongated/tilted) → boundary bends; you can even get multiple disjoint regions in 1D.

TL;DR

  • Different covariances → QDA → Quadratic boundaries.

  • Same covariance → LDA → Linear boundaries.

  • Same covariance + priors → pick the closest mean (Euclidean if spherical).

<p><span>General case ⇒ </span><strong>quadratic</strong><span> rule (QDA)</span></p><p><span>Expand the square:</span></p><ul><li><p>g<sub>k​</sub>(x) =x<sup>⊤</sup>A<sub>k​</sub>x + x<sup>⊤</sup>b<sub>k​</sub>​+c<sub>k​</sub></p></li></ul><p><span>with (see pics)</span></p><p></p><p><span>The difference between two classes a,b:</span></p><p>d<sub>ab​</sub>(x)=g<sub>a</sub>​(x)−g<sub>b</sub>​(x)= x<sup>⊤</sup>A*x+x<sup>⊤</sup>b+ c*</p><p></p><p><span>Because of the x<sup>⊤</sup>(⋅)x term, the boundary d</span><sub>ab​</sub><span>(x)=0 is </span><strong>quadratic</strong><span> (curved).</span><br><span>That’s why your left plot says “quadratic” and shows a bent boundary.</span></p><p></p><p></p><p><span>Special case: </span><strong>equal covariances</strong><span> (Σ<sub>k</sub>=Σ for all k) ⇒ </span><strong>linear</strong><span> rule (LDA)</span></p><p><span>The quadratic terms cancel (A<sub>a</sub>=A</span><sub>b</sub><span>), so</span></p><ul><li><p>g<sub>k</sub>(x)=x<sup>⊤</sup>b<sub>k</sub>+c<sub>k</sub>,</p></li><li><p>b<sub>k</sub>=Σ<sup>−1</sup>μ<sub>k</sub>, &nbsp; &nbsp;c<sub>k</sub>​=−1/2 ​μ<sub>k</sub><sup>⊤</sup>​Σ<sup>−1</sup>μ<sub>k</sub>+logπ<sub>k</sub></p></li></ul><p><span>Decision boundary between two classes is a </span><strong>hyperplane</strong><span> (a straight line in 2D)</span></p><p></p><p>special case 2:&nbsp;<strong>equal covariances and equal priors</strong></p><p><span>Then log⁡π<sub>k</sub> is the same for all k and drops out.</span></p><ul><li><p>gk<sub>​</sub>(x)= || x -&nbsp;<span>μ<sub>k </sub>|| ² &nbsp; ← euclidean&nbsp;</span>distance to mean</p></li></ul><p></p><p>How to read your plots</p><ul><li><p><strong>Linear panel:</strong> same Σ for both classes → straight boundary.</p></li><li><p><strong>Quadratic panel:</strong> different Σ<sub>k </sub> (e.g., one class more elongated/tilted) → boundary bends; you can even get multiple disjoint regions in 1D.</p></li></ul><p></p><p>TL;DR</p><ul><li><p>Different covariances → <strong>Q</strong>DA → <strong>Q</strong>uadratic boundaries.</p></li><li><p>Same covariance → <strong>L</strong>DA → <strong>L</strong>inear boundaries.</p></li><li><p>Same covariance + priors → pick the <strong>closest mean</strong> (Euclidean if spherical).</p></li></ul><p></p>
7
New cards
<p>js image</p>

js image

8
New cards
<p>estimating distribution parameters</p>

estimating distribution parameters

maximum likelihood (MLE)

What we’re estimating (per class k)

  • Prior πk=P(y=k) → how common the class is

  • Mean μk → average feature vector in class k

  • Covariance Σk→ spread/shape of features in class k

Pick parameters θ that make the observed data most probable:

θ^=argθmax​ L(θ), L(θ)=P(all data∣θ).

With the standard i.i.d. assumption (examples are independent and identically distributed):

L(θ)= i=1m​ P(x(i),y(i)∣θ).

Independence ⇒ product over i.
Identical ⇒ each term uses the same parameters θ

We almost always maximize the log-likelihood (sums are easier than products):

ℓ(θ) = logL(θ) =i=1m​ logP (x(i),y(i) | θ).

9
New cards

estimating distribution parameters

Goal:

Estimate the Gaussian parameters for each class k: the mean μk and variance σk² (and the prior πk by Maximum Likelihood (MLE).