Gaussian Mixture Model Notes
Pattern Classification: Gaussian Mixture Model
Recap: Kernel Density Estimator
- Kernel Density Estimator is a method for estimating the probability density function of a random variable.
- It involves placing a kernel function (e.g., Gaussian) at each data point and summing them up to approximate the true density.
Gaussian Mixture Model (GMM)
- Problem: When dealing with a small dataset, it is not feasible to estimate class conditionals using the Kernel Density Estimator.
- Solution: Model each class conditional as a sum of multivariate Gaussian densities.
The class conditional density P(X∣Ci) is modeled as a weighted sum of Gaussian densities:
P(X∣C<em>i)=∑</em>j=1Kw<em>jN(X,μ</em>j,Σj)
Where:
- wj represents the probability of each mixture component.
- N(X,μ<em>j,Σ</em>j) is a multi-variate Gaussian density with mean μ<em>j and covariance Σ</em>j.
The parameters (mean vectors and covariance matrices) are determined so that this sum approximates the given class conditional density as closely as possible.
GMM Parameters and Conditions
Parameters: Mixture weights w<em>j, means μ</em>j, and covariances Σj.
Condition: The sum of the weights must equal 1:
∑<em>j=1Kw</em>j=1
This is necessary to ensure that P(X∣Ci) is a valid probability density function.
∫P(X)dX=∫∑<em>j=1Kw</em>jN(X,μ<em>j,Σ</em>j)dX=∑<em>j=1Kw</em>j∫N(X,μ<em>j,Σ</em>j)dX=∑<em>j=1Kw</em>j∗1=1
1-D Example
Consider an example with three Gaussian components:
- w<em>1N(X,μ</em>1,σ1)
- w<em>2N(X,μ</em>2,σ2)
- w<em>3N(X,μ</em>3,σ3)
The sum of these components approximates the class conditional probability P(X∣Ci).
P(X∣C<em>i)≈∑</em>j=1Kw<em>jN(X,μ</em>j,σj)
Expectation-Maximization (EM) Algorithm
The EM algorithm is used to estimate the parameters of the GMM components.
It is an iterative algorithm.
For simplicity, consider a 2-component case:
P(X∣C<em>i)=wN(X,μ</em>1,Σ<em>1)+(1−w)N(X,μ</em>2,Σ2)
EM Algorithm Steps
Initialization: Take initial guesses for the parameters w,μ<em>1,Σ</em>1,μ<em>2, and Σ</em>2.
Expectation Step: Compute the responsibilities:
γ(i)=wN(X(m),μ<em>1,Σ</em>1)+(1−w)N(X(m),μ<em>2,Σ</em>2)wN(X(m),μ<em>1,Σ</em>1)
γ(i) represents the probability that X(m) is generated from component 1.
Maximization Step: Compute the weighted means and covariance matrices:
μ<em>1=∑</em>m=1Mγi∑</em>m=1Mγ<em>iX(m)
μ<em>2=∑</em>m=1M(1−γi)∑</em>m=1M(1−γ<em>i)X(m)
Σ<em>1=∑</em>m=1Mγi∑</em>m=1Mγ<em>i(X(m)−μ</em>1)(X(m)−μ<em>1)T
Σ<em>2=∑</em>m=1M(1−γi)∑</em>m=1M(1−γ<em>i)(X(m)−μ</em>2)(X(m)−μ<em>2)T
w=M∑<em>m=1Mγ</em>i
Iteration: Iterate steps 2 & 3 until convergence.
Interpretation of Responsibilities
γi=P(X(m)∈component 1)
Using Bayes' rule:
γi=P(X(m))P(comp. 1)P(X(m)∣comp. 1)=P(comp. 1)P(X(m)∣comp. 1)+P(comp. 2)P(X(m)∣comp. 2)P(comp. 1)P(X(m)∣comp. 1)
Issues with GMM
- Initialization:
- EM is very sensitive to initial conditions.
- Starting from a bad initial guess can lead to poor results.
- Usually, K-Means is used to get a good initialization.
- Number of Gaussian Components:
- Choosing the appropriate number of Gaussian components is crucial.
- Try different numbers of components and choose the best based on a validation set.