Gaussian Mixture Model Notes
Machine Learning Methods
- Gaussian Mixture Model (GMM) is a method used in machine learning.
- The model involves clusters, such as Cluster 1, Cluster 2, and Cluster 3.
Recap of Conditional Probabilities and Bayes Rule
- Conditional probabilities:
p(A, B) = P(A|B)p(B) = p(B|A)p(A) - Bayes rule components:
- Conditional Probability: P(A|B)
- Likelihood: p(B|A)
- Prior Probability: p(A)
- Posterior Probability: p(A|B)
- Marginal Probability: p(B)
- Marginal Probability formula:
p(A = 1) = \Sigma{i} P(A = 1,B{i}) = \Sigma{i} p(A|B{i})p(B_{i})
Example of Probabilities Calculation
- Given probabilities for tomorrow's weather (Rainy/Cold) based on today's weather:
- P(Tomorrow=Rainy | Today=Rainy) = 4/9
- P(Tomorrow=Cold | Today=Rainy) = 2/9
- P(Tomorrow=Rainy | Today=Cold) = 2/9
- P(Tomorrow=Cold | Today=Cold) = 1/9
- Marginal probabilities for Today:
- P(Today=Rainy) = 4/9 + 2/9 = 2/3
- P(Today=Cold) = 2/9 + 1/9 = 1/3
- Marginal probabilities for Tomorrow:
- P(Tomorrow=Rainy) = 4/9 + 2/9 = 2/3
- P(Tomorrow=Cold) = 2/9 + 1/9 = 1/3
Hard Clustering Challenges
- Hard Clustering methods include K-Means, Hierarchical Clustering, and DBSCAN.
- These methods can be difficult in scenarios where data points do not clearly belong to a single cluster.
Towards Soft Clustering
- K-means:
- Performs hard assignment, where each object belongs to only one cluster.
- Mixture modeling:
- Performs soft assignment, providing a probability that an object belongs to a cluster.
- o{i} \in {o{1}, …, o_{K}}
- Example probabilities: p=0.315, p=0.287
Gaussian Mixture Model (GMM)
- GMM provides a soft assignment of data points to clusters.
Gaussian Distribution
- 1-d Gaussian distribution:
N(\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}} - μ is the mean.
- σ is the standard deviation.
- GMM uses Gaussian distributions to model clusters.
Gaussian Distribution in d Dimensions
- For d dimensions, the Gaussian distribution of a vector x = (x^{1}, x^{2}, …,x^{d})^{T} is defined by:
N(x|u,\Sigma) = \frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u))
- where u is the mean vector.
- Σ is the covariance matrix.
- Example:
- u = (0, 0)
- \Sigma = \begin{bmatrix} 0.25 & 0.30 \ 0.30 & 1.00 \end{bmatrix}
Mixture Models
- A Mixture Model is the weighted sum of a number of PDFs.
- Weights are determined by a distribution \pi{0}, \pi{1}, \pi_{2}
Mixture Model Equation
- The probability density function p(x) is given by:
p(x) = \pi{0}f{0}(x) + \pi{1}f{1}(x) + \pi{2}f{2}(x)
- \pi{0}, \pi{1}, \pi_{2} are the mixing proportions.
- f{0}(x), f{1}(x), f_{2}(x) are the component densities.
Why GMM?
- GMM creates a new PDF for generating random variables.
- It is a generative model.
- It clusters different components using Gaussian distributions.
- Provides inferring opportunity.
- Soft assignment of data points to clusters.
Intuition Behind GMM
- Consider a histogram of temperature readings.
- Two distributions may indicate data from two different cities.
- Formulate as mathematics with:
- Observable variable: temperature.
- Latent variable: cities (z).
Gaussian Mixture Models: Mathematical Notation
- For the general case, π is a vector of probabilities (non-negative values which sum to 1).
- π is known as the mixing proportions.
Gaussian Mixture Models: PDF
- The probability density function (PDF) over x is computed by marginalizing out (summing out) z:
- This PDF is a convex combination, or weighted average, of the PDFs of the component distributions.
Posterior Inference
- Assume parameters of the model have been chosen.
- Goal: Infer, given a data point x, which component it likely belongs to.
- In mathematical terms, infer the posterior distribution p(z|x).
- Posterior distribution can be inferred using Bayes’ Rule.
Example I
- Using a previous example, suppose we observe x=2 in the model.
- We want to compute the posterior probability Pr(z = 1|x).
Example II
- Observations are two-dimensional (x1, x2).
- We observe x1 and want to predict x2 using the posterior predictive distribution.
Example II: Two-Dimensional Mixture of Gaussian Model
- Have a two-dimensional mixture of Gaussian model where x1 and x2 are conditionally independent given z.
- Suppose we observe x1=3.
- We can compute the posterior distribution just like in the previous example.
Example II: Posterior Predictive Distribution
- Compute the posterior predictive distribution using the posterior:
p(x{2}|x{1}) = Pr(z = 1|x{1})p(x{2}|z = 1) + Pr(z = 2|x{1})p(x{2}|z = 2)
= 0.213 \text{Gaussian}(x{2}; 6, 1) + 0.787 \text{Gaussian} (x{2}; 3, 2)
Learning
- Parameters of GMM:
- Mean \mu{k} and standard deviation \sigma{k} associated with each component k.
- Mixing proportions \pi_{k}, defined as Pr(z = k).
- Idea:
- The model performs inferencing repeatedly to learn the parameters.
Learning: Maximum Likelihood Estimation (MLE)
- Use Maximum Likelihood Estimation (MLE) to solve the parameter learning problem.
Learning: Log-Likelihood Derivatives
- Compute log-likelihood derivatives by setting the derivatives to 0, or use gradient descent.
- θ can be any parameter to learn, such as mixing proportion, mean, or standard deviation of a component.
- Expected derivative of the joint log probability.
Learning: Optimization of Means
- Optimization of means:
lnp(x|\pi,\mu, \Sigma) = \sum{n=1}^{N} \sum{k=1}^{K} T{nk} ln (\pik N (xn | \muk, \Sigmak))
\frac{\partial}{\partial \muk} lnp(x|\pi,\mu, \Sigma) = \sum{n=1}^{N} T{nk} {\Sigmak}^{-1} (xn - \muk)=0
\muk = \frac{\sum{n=1}^{N} T{nk} xn}{\sum{n=1}^{N} T_{nk}}
Learning: Optimization of Covariance
- Optimization of covariance:
\Sigmak = \frac{\sum{n=1}^{N} T{nk} (xn - \muk)(xn - \muk)^T}{\sum{n=1}^{N} T_{nk}}
Learning: Optimization of Mixing Term
- Optimization of mixing term:
ln p(\pi|\pi, \mu, \Sigma) + \lambda (\sum{k=1}^{K} \pik -1) - \pik = \frac{Nk}{N}
Where - Nk = \sum{n=1}^{N} T_{nk}
MLE of a GMM
- MLE of a GMM:
\mu{k} = \frac{1}{N{k}} \sum{n=1}^{N} T(z{nk}) x{n}
\Sigma{k} = \frac{1}{N{k}} \sum{n=1}^{N} T(z{nk}) (x{n} - \mu{k}) (x{n} - \mu{k})^{T}
\pi{k} = \frac{N{k}}{N}
N{k} = \sum{n=1}^{N} T(z{nk}) - Not a closed form solution!!
- Use Expectation-Maximization Algorithm
Take-Home Messages
- The generative process of Gaussian Mixture Model
- Inferring cluster membership based on a learned GMM