Gaussian Mixture Model Notes
Pattern Classification: Gaussian Mixture Model
Recap: Kernel Density Estimator
- Kernel Density Estimator is a method for estimating the probability density function of a random variable.
- It involves placing a kernel function (e.g., Gaussian) at each data point and summing them up to approximate the true density.
Gaussian Mixture Model (GMM)
- Problem: When dealing with a small dataset, it is not feasible to estimate class conditionals using the Kernel Density Estimator.
- Solution: Model each class conditional as a sum of multivariate Gaussian densities.
GMM Formulation
The class conditional density P(X|C_i) is modeled as a weighted sum of Gaussian densities:
P(X|Ci) = \sum{j=1}^{K} wj N(X, \muj, \Sigma_j)
Where:
- w_j represents the probability of each mixture component.
- N(X, \muj, \Sigmaj) is a multi-variate Gaussian density with mean \muj and covariance \Sigmaj.
The parameters (mean vectors and covariance matrices) are determined so that this sum approximates the given class conditional density as closely as possible.
GMM Parameters and Conditions
Parameters: Mixture weights wj, means \muj, and covariances \Sigma_j.
Condition: The sum of the weights must equal 1:
\sum{j=1}^{K} wj = 1
This is necessary to ensure that P(X|C_i) is a valid probability density function.
\int P(X) dX = \int \sum{j=1}^{K} wj N(X, \muj, \Sigmaj) dX = \sum{j=1}^{K} wj \int N(X, \muj, \Sigmaj) dX = \sum{j=1}^{K} wj * 1 = 1
1-D Example
Consider an example with three Gaussian components:
- w1N(X, \mu1, \sigma_1)
- w2N(X, \mu2, \sigma_2)
- w3N(X, \mu3, \sigma_3)
The sum of these components approximates the class conditional probability P(X|C_i).
P(X|Ci) \approx \sum{j=1}^{K} wjN(X, \muj, \sigma_j)
Expectation-Maximization (EM) Algorithm
The EM algorithm is used to estimate the parameters of the GMM components.
It is an iterative algorithm.
For simplicity, consider a 2-component case:
P(X|Ci) = wN(X, \mu1, \Sigma1) + (1 - w)N(X, \mu2, \Sigma_2)
EM Algorithm Steps
Initialization: Take initial guesses for the parameters w, \mu1, \Sigma1, \mu2, and \Sigma2.
Expectation Step: Compute the responsibilities:
\gamma(i) = \frac{w N(X^{(m)}, \mu1, \Sigma1)}{w N(X^{(m)}, \mu1, \Sigma1) + (1 - w) N(X^{(m)}, \mu2, \Sigma2)}
\gamma(i) represents the probability that X^{(m)} is generated from component 1.
Maximization Step: Compute the weighted means and covariance matrices:
\mu1 = \frac{\sum{m=1}^{M} \gammai X^{(m)}}{\sum{m=1}^{M} \gamma_i}
\mu2 = \frac{\sum{m=1}^{M} (1-\gammai) X^{(m)}}{\sum{m=1}^{M} (1-\gamma_i)}
\Sigma1 = \frac{\sum{m=1}^{M} \gammai ( X^{(m)} - \mu1 )( X^{(m)} - \mu1 )^T}{\sum{m=1}^{M} \gamma_i}
\Sigma2 = \frac{\sum{m=1}^{M} (1-\gammai) ( X^{(m)} - \mu2 )( X^{(m)} - \mu2 )^T}{\sum{m=1}^{M} (1-\gamma_i)}
w = \frac{\sum{m=1}^{M} \gammai}{M}
Iteration: Iterate steps 2 & 3 until convergence.
Interpretation of Responsibilities
\gamma_i = P(X^{(m)} \in \text{component 1})
Using Bayes' rule:
\gamma_i = \frac{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1})}{P(X^{(m)})} = \frac{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1})}{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1}) + P(\text{comp. 2}) P(X^{(m)} | \text{comp. 2})}
Issues with GMM
- Initialization:
- EM is very sensitive to initial conditions.
- Starting from a bad initial guess can lead to poor results.
- Usually, K-Means is used to get a good initialization.
- Number of Gaussian Components:
- Choosing the appropriate number of Gaussian components is crucial.
- Try different numbers of components and choose the best based on a validation set.