Gaussian Mixture Model Notes

Pattern Classification: Gaussian Mixture Model

Recap: Kernel Density Estimator

  • Kernel Density Estimator is a method for estimating the probability density function of a random variable.
  • It involves placing a kernel function (e.g., Gaussian) at each data point and summing them up to approximate the true density.

Gaussian Mixture Model (GMM)

  • Problem: When dealing with a small dataset, it is not feasible to estimate class conditionals using the Kernel Density Estimator.
  • Solution: Model each class conditional as a sum of multivariate Gaussian densities.

GMM Formulation

  • The class conditional density P(X|C_i) is modeled as a weighted sum of Gaussian densities:

    P(X|Ci) = \sum{j=1}^{K} wj N(X, \muj, \Sigma_j)

    Where:

    • w_j represents the probability of each mixture component.
    • N(X, \muj, \Sigmaj) is a multi-variate Gaussian density with mean \muj and covariance \Sigmaj.
  • The parameters (mean vectors and covariance matrices) are determined so that this sum approximates the given class conditional density as closely as possible.

GMM Parameters and Conditions

  • Parameters: Mixture weights wj, means \muj, and covariances \Sigma_j.

  • Condition: The sum of the weights must equal 1:

    \sum{j=1}^{K} wj = 1

    This is necessary to ensure that P(X|C_i) is a valid probability density function.

    \int P(X) dX = \int \sum{j=1}^{K} wj N(X, \muj, \Sigmaj) dX = \sum{j=1}^{K} wj \int N(X, \muj, \Sigmaj) dX = \sum{j=1}^{K} wj * 1 = 1

1-D Example

  • Consider an example with three Gaussian components:

    • w1N(X, \mu1, \sigma_1)
    • w2N(X, \mu2, \sigma_2)
    • w3N(X, \mu3, \sigma_3)
  • The sum of these components approximates the class conditional probability P(X|C_i).

    P(X|Ci) \approx \sum{j=1}^{K} wjN(X, \muj, \sigma_j)

Expectation-Maximization (EM) Algorithm

  • The EM algorithm is used to estimate the parameters of the GMM components.

  • It is an iterative algorithm.

  • For simplicity, consider a 2-component case:

    P(X|Ci) = wN(X, \mu1, \Sigma1) + (1 - w)N(X, \mu2, \Sigma_2)

EM Algorithm Steps

  1. Initialization: Take initial guesses for the parameters w, \mu1, \Sigma1, \mu2, and \Sigma2.

  2. Expectation Step: Compute the responsibilities:

    \gamma(i) = \frac{w N(X^{(m)}, \mu1, \Sigma1)}{w N(X^{(m)}, \mu1, \Sigma1) + (1 - w) N(X^{(m)}, \mu2, \Sigma2)}

    \gamma(i) represents the probability that X^{(m)} is generated from component 1.

  3. Maximization Step: Compute the weighted means and covariance matrices:

    \mu1 = \frac{\sum{m=1}^{M} \gammai X^{(m)}}{\sum{m=1}^{M} \gamma_i}

    \mu2 = \frac{\sum{m=1}^{M} (1-\gammai) X^{(m)}}{\sum{m=1}^{M} (1-\gamma_i)}

    \Sigma1 = \frac{\sum{m=1}^{M} \gammai ( X^{(m)} - \mu1 )( X^{(m)} - \mu1 )^T}{\sum{m=1}^{M} \gamma_i}

    \Sigma2 = \frac{\sum{m=1}^{M} (1-\gammai) ( X^{(m)} - \mu2 )( X^{(m)} - \mu2 )^T}{\sum{m=1}^{M} (1-\gamma_i)}

    w = \frac{\sum{m=1}^{M} \gammai}{M}

  4. Iteration: Iterate steps 2 & 3 until convergence.

Interpretation of Responsibilities

  • \gamma_i = P(X^{(m)} \in \text{component 1})

  • Using Bayes' rule:

    \gamma_i = \frac{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1})}{P(X^{(m)})} = \frac{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1})}{P(\text{comp. 1}) P(X^{(m)} | \text{comp. 1}) + P(\text{comp. 2}) P(X^{(m)} | \text{comp. 2})}

Issues with GMM

  • Initialization:
    • EM is very sensitive to initial conditions.
    • Starting from a bad initial guess can lead to poor results.
    • Usually, K-Means is used to get a good initialization.
  • Number of Gaussian Components:
    • Choosing the appropriate number of Gaussian components is crucial.
    • Try different numbers of components and choose the best based on a validation set.