Gaussian Mixture Model Notes

Machine Learning Methods

Gaussian Mixture Model (GMM) is a method used in machine learning.
The model involves clusters, such as Cluster 1, Cluster 2, and Cluster 3.

Recap of Conditional Probabilities and Bayes Rule

Conditional probabilities:
p(A, B) = P(A|B)p(B) = p(B|A)p(A)
Bayes rule components:
- Conditional Probability: P(A|B)
- Likelihood: p(B|A)
- Prior Probability: p(A)
- Posterior Probability: p(A|B)
- Marginal Probability: p(B)
Marginal Probability formula:
p(A = 1) = \Sigma{i} P(A = 1,B{i}) = \Sigma{i} p(A|B{i})p(B_{i})

Example of Probabilities Calculation

Given probabilities for tomorrow's weather (Rainy/Cold) based on today's weather:
- P(Tomorrow=Rainy | Today=Rainy) = 4/9
- P(Tomorrow=Cold | Today=Rainy) = 2/9
- P(Tomorrow=Rainy | Today=Cold) = 2/9
- P(Tomorrow=Cold | Today=Cold) = 1/9
Marginal probabilities for Today:
- P(Today=Rainy) = 4/9 + 2/9 = 2/3
- P(Today=Cold) = 2/9 + 1/9 = 1/3
Marginal probabilities for Tomorrow:
- P(Tomorrow=Rainy) = 4/9 + 2/9 = 2/3
- P(Tomorrow=Cold) = 2/9 + 1/9 = 1/3

Hard Clustering Challenges

Hard Clustering methods include K-Means, Hierarchical Clustering, and DBSCAN.
These methods can be difficult in scenarios where data points do not clearly belong to a single cluster.

Towards Soft Clustering

K-means:
- Performs hard assignment, where each object belongs to only one cluster.
Mixture modeling:
- Performs soft assignment, providing a probability that an object belongs to a cluster.
- o{i} \in {o{1}, …, o_{K}}
- Example probabilities: p=0.315, p=0.287

Gaussian Mixture Model (GMM)

GMM provides a soft assignment of data points to clusters.

Gaussian Distribution

1-d Gaussian distribution:
N(\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}
μ is the mean.
σ is the standard deviation.
GMM uses Gaussian distributions to model clusters.

Gaussian Distribution in d Dimensions

For d dimensions, the Gaussian distribution of a vector x = (x^{1}, x^{2}, …,x^{d})^{T} is defined by: N(x|u,\Sigma) = \frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u))
- where u is the mean vector.
- Σ is the covariance matrix.
Example:
- u = (0, 0)
- \Sigma = \begin{bmatrix} 0.25 & 0.30 \ 0.30 & 1.00 \end{bmatrix}

Mixture Models

A Mixture Model is the weighted sum of a number of PDFs.
Weights are determined by a distribution \pi{0}, \pi{1}, \pi_{2}

Mixture Model Equation

The probability density function p(x) is given by: p(x) = \pi{0}f{0}(x) + \pi{1}f{1}(x) + \pi{2}f{2}(x)
- \pi{0}, \pi{1}, \pi_{2} are the mixing proportions.
- f{0}(x), f{1}(x), f_{2}(x) are the component densities.

Why GMM?

GMM creates a new PDF for generating random variables.
It is a generative model.
It clusters different components using Gaussian distributions.
Provides inferring opportunity.
Soft assignment of data points to clusters.

Intuition Behind GMM

Consider a histogram of temperature readings.
Two distributions may indicate data from two different cities.
Formulate as mathematics with:
- Observable variable: temperature.
- Latent variable: cities (z).

Gaussian Mixture Models: Mathematical Notation

For the general case, π is a vector of probabilities (non-negative values which sum to 1).
π is known as the mixing proportions.

Gaussian Mixture Models: PDF

The probability density function (PDF) over x is computed by marginalizing out (summing out) z:
This PDF is a convex combination, or weighted average, of the PDFs of the component distributions.

Posterior Inference

Assume parameters of the model have been chosen.
Goal: Infer, given a data point x, which component it likely belongs to.
In mathematical terms, infer the posterior distribution p(z|x).
Posterior distribution can be inferred using Bayes’ Rule.

Example I

Using a previous example, suppose we observe x=2 in the model.
We want to compute the posterior probability Pr(z = 1|x).

Example II

Observations are two-dimensional (x1, x2).
We observe x1 and want to predict x2 using the posterior predictive distribution.

Example II: Two-Dimensional Mixture of Gaussian Model

Have a two-dimensional mixture of Gaussian model where x1 and x2 are conditionally independent given z.
Suppose we observe x1=3.
We can compute the posterior distribution just like in the previous example.

Example II: Posterior Predictive Distribution

Compute the posterior predictive distribution using the posterior:
p(x{2}|x{1}) = Pr(z = 1|x{1})p(x{2}|z = 1) + Pr(z = 2|x{1})p(x{2}|z = 2)
= 0.213 \text{Gaussian}(x{2}; 6, 1) + 0.787 \text{Gaussian} (x{2}; 3, 2)

Learning

Parameters of GMM:
- Mean \mu{k} and standard deviation \sigma{k} associated with each component k.
- Mixing proportions \pi_{k}, defined as Pr(z = k).
Idea:
- The model performs inferencing repeatedly to learn the parameters.

Learning: Maximum Likelihood Estimation (MLE)

Use Maximum Likelihood Estimation (MLE) to solve the parameter learning problem.

Learning: Log-Likelihood Derivatives

Compute log-likelihood derivatives by setting the derivatives to 0, or use gradient descent.
θ can be any parameter to learn, such as mixing proportion, mean, or standard deviation of a component.
Expected derivative of the joint log probability.

Learning: Optimization of Means

Optimization of means:
lnp(x|\pi,\mu, \Sigma) = \sum{n=1}^{N} \sum{k=1}^{K} T{nk} ln (\pik N (xn | \muk, \Sigmak)) \frac{\partial}{\partial \muk} lnp(x|\pi,\mu, \Sigma) = \sum{n=1}^{N} T{nk} {\Sigmak}^{-1} (xn - \muk)=0 \muk = \frac{\sum{n=1}^{N} T{nk} xn}{\sum{n=1}^{N} T_{nk}}

Learning: Optimization of Covariance

Optimization of covariance:
\Sigmak = \frac{\sum{n=1}^{N} T{nk} (xn - \muk)(xn - \muk)^T}{\sum{n=1}^{N} T_{nk}}

Learning: Optimization of Mixing Term

Optimization of mixing term:
ln p(\pi|\pi, \mu, \Sigma) + \lambda (\sum{k=1}^{K} \pik -1)
\pik = \frac{Nk}{N}
Where
Nk = \sum{n=1}^{N} T_{nk}

MLE of a GMM

MLE of a GMM:
\mu{k} = \frac{1}{N{k}} \sum{n=1}^{N} T(z{nk}) x{n} \Sigma{k} = \frac{1}{N{k}} \sum{n=1}^{N} T(z{nk}) (x{n} - \mu{k}) (x{n} - \mu{k})^{T} \pi{k} = \frac{N{k}}{N} N{k} = \sum{n=1}^{N} T(z{nk})
Not a closed form solution!!
Use Expectation-Maximization Algorithm

Take-Home Messages

The generative process of Gaussian Mixture Model
Inferring cluster membership based on a learned GMM