Hierarchical Clustering

Agglomerative Hierarchical Clustering

  • A bottom-up approach to hierarchical clustering.
  • Each data point starts as its own cluster; clusters are iteratively merged until a stopping condition is met.
  • Alternative: Divisive (top-down) approach starts with all points in one cluster and recursively splits into smaller clusters.

Algorithm Overview

  • Initialization: Each data point is its own cluster (N clusters for N points).
  • Repeat until stopping criterion:
    • Find the two most similar (nearest) clusters.
    • Merge them into a new (parent) cluster.
  • Stopping criteria:
    • Stop when a predefined number of clusters k is reached.
    • Or stop when there is only one cluster.

Data Example (8 points)

  • Data points (index: [x, y]):

    • 0: [2.25, 6.48]
    • 1: [5.49, 3.32]
    • 2: [4.45, 7.65]
    • 3: [7.66, 1.73]
    • 4: [3.83, 2.89]
    • 5: [0.94, 4.41]
    • 6: [4.62, 4.28]
    • 7: [1.06, 4.03]
  • Initial state (each point its own cluster):

    • Cluster 1: [0]
    • Cluster 2: [1]
    • Cluster 3: [2]
    • Cluster 4: [3]
    • Cluster 5: [4]
    • Cluster 6: [5]
    • Cluster 7: [6]
    • Cluster 8: [7]
  • First merge: two nearest clusters merged into a new cluster

    • Example after first merge: Cluster 6: [5, 7]
    • Remaining clusters: [0], [1], [2], [3], [4], [5,7], [6]
    • Dendrogram is used to describe the clustering process as a tree
  • Second merge: next nearest clusters merged

    • Example after second merge: Cluster 2: [1, 6]
  • Third merge: continue merging nearest clusters

    • Example after third merge: Cluster 2 becomes [1, 4, 6]
  • Step-wise clustering progression (illustrative sequence):

    • Step 5 (4 clusters): Cluster 1: [0, 5, 7], Cluster 2: [1, 4, 6], Cluster 3: [2], Cluster 4: [3]
    • Step 6 (3 clusters): Cluster 1: [0, 2, 5, 7], Cluster 2: [1, 4, 6], Cluster 3: [3]
    • Step 7 (2 clusters): Cluster 1: [0, 2, 5, 7], Cluster 2: [1, 3, 4, 6]
    • Step 8 (1 cluster): Cluster 1: [0, 1, 2, 3, 4, 5, 6, 7]
  • Dendrogram

    • A tree-like diagram showing hierarchical relationships between data points and clusters.
    • Used to describe the clustering process in a convenient visual form.

Question: How do we define similarity between clusters?

  • The transcript asks: What is missing here? How to define the similarity of two clusters?
  • In hierarchical clustering, cluster similarity is defined via linkage criteria. Common options are:
    • Single linkage
    • Complete linkage
    • Average linkage
    • Centroid method

Linkage methods (cluster distance definitions)

  • Single linkage

    • Definition: D(C<em>1,C</em>2)=min<em>xC</em>1,yC2D(x,y)D(C<em>1, C</em>2) = \min<em>{x \in C</em>1, y \in C_2} D(x, y)
    • Interpretation: distance between clusters is the minimum distance between any pair of points, one from each cluster.
  • Complete linkage

    • Definition: D(C<em>1,C</em>2)=max<em>xC</em>1,yC2D(x,y)D(C<em>1, C</em>2) = \max<em>{x \in C</em>1, y \in C_2} D(x, y)
    • Interpretation: distance between clusters is the maximum distance between any pair of points, one from each cluster.
  • Average linkage

    • Definition: D(C<em>1,C</em>2)=1C<em>1  C</em>2<em>xC</em>1<em>yC</em>2D(x,y)D(C<em>1, C</em>2) = \frac{1}{|C<em>1| \; |C</em>2|} \sum<em>{x \in C</em>1} \sum<em>{y \in C</em>2} D(x, y)
    • Interpretation: average distance over all cross-cluster point pairs.
  • Centroid method

    • Concept: combine clusters based on the distance between their centroids.
    • Definition (distance between centroids): D(C<em>1,C</em>2)=1C<em>1</em>xC<em>1x1C</em>2<em>yC</em>2yD(C<em>1, C</em>2) = \left| \frac{1}{|C<em>1|} \sum</em>{x \in C<em>1} x - \frac{1}{|C</em>2|} \sum<em>{y \in C</em>2} y \right|
    • Interpretation: distance between cluster centroids, using the Euclidean norm; clusters merged by closest centroids.
  • Note: All of the above require a base distance measure D(x, y) between points (often Euclidean distance).

Worked example: Single linkage distance calculation (Illustrative Iteration 4)

  • Data points (index: [x, y]):

    • 0: [2.25, 6.48], 1: [5.49, 3.32], 2: [4.45, 7.65], 3: [7.66, 1.73], 4: [3.83, 2.89], 5: [0.94, 4.41], 6: [4.62, 4.28], 7: [1.06, 4.03]
  • Iteration 4: distance between Cluster [1, 6, 4] and Cluster [5, 7]

    • Distances between all pairs (indices shown):
    • dist(1, 5) = 4.68
    • dist(1, 7) = 4.48
    • dist(6, 5) = 3.69
    • dist(6, 7) = 3.57
    • dist(4, 5) = 3.27
    • dist(4, 7) = 2.99
    • Single linkage chooses the minimum distance: cluster distance = 2.992.99
  • The base distance D(x, y) is the point-to-point distance used in the calculation above.

Stopping criteria

  • We stop merging when we have a predefined number of clusters k.
  • Alternatively, we may continue merging until there is only one cluster.
  • Practical note: In many analyses, a dendrogram is inspected to choose a reasonable k by cutting the tree at a chosen level.

Pros and Cons

  • Pros

    • No need to predefine K (number of clusters) explicitly
    • No assumption about cluster shape (can capture irregular shapes)
    • Simple and intuitive
  • Cons

    • The algorithm cannot undo a merge; early mistakes persist
    • Different distance metrics can produce very different results
    • Single linkage is sensitive to noise and outliers (can lead to chaining)
    • Complete linkage tends to break large clusters (prefers compact clusters)
  • Notes on linkage behavior (illustrative effects)

    • Single linkage: may create long chain-like clusters by linking through near neighbors; outliers can bridge clusters
    • Complete linkage: tends to produce compact clusters and can fail to join large loosely connected groups

Additional contextual points

  • Notation used in the example:
    • Clusters C1, C2, … represent sets of data point indices
    • Distances D(x, y) are pairwise distances between points x and y
  • Real-world relevance:
    • Hierarchical clustering provides a multi-scale view of data structure via the dendrogram
    • Useful when the number of clusters is not known a priori; allows exploration at different levels of granularity

Quick references to the sources from the transcript

  • Dendrogram as a tree-structured representation of hierarchical clustering
  • Stepwise cluster evolution (8 data points) illustrating how clusters merge over iterations
  • Distance-based linkage criteria (Single, Complete, Average, Centroid)
  • Practical issues: stopping criteria, pros/cons, and sensitivity to noise/outliers