1/13
Vocabulary flashcards covering key clustering concepts and algorithms from the notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
K-Means Clustering
A partitioning unsupervised algorithm that divides data into K non-overlapping clusters by minimizing the within-cluster sum of squares (inertia).
Inertia (Within-cluster Sum of Squares)
The sum of squared distances between data points and their cluster centroids; K-Means aims to minimize this value.
Hierarchical Clustering
Builds a hierarchy of clusters by recursively merging or splitting clusters; can be agglomerative (bottom-up) or divisive (top-down).
Agglomerative Clustering
A hierarchical approach that starts with each data point as its own cluster and merges clusters based on a linkage criterion (e.g., single, complete, average).
DBSCAN
Density-Based Spatial Clustering that groups densely packed points using ε (epsilon) and MinPts; handles noise and discovers clusters of arbitrary shape without predefining the number of clusters.
ε (epsilon) in DBSCAN
Maximum distance between two points for them to be considered neighbors.
MinPts
Minimum number of points required in a point's ε-neighborhood to form a dense region.
Core Point
A point with at least MinPts points within its ε-neighborhood (including itself).
Border Point
A point within the ε-neighborhood of a core point but not itself a core point.
Noise Point (Outlier)
A point that is neither a core point nor a border point and is not assigned to a cluster.
OPTICS
Ordering Points To Identify the Clustering Structure; extension of DBSCAN producing a hierarchical clustering structure and robustness to varying densities.
Mean Shift Clustering
Non-parametric algorithm that shifts centroids toward areas of higher data density to identify clusters.
Gaussian Mixture Models (GMM)
Model-based clustering assuming data are generated from a mixture of Gaussian distributions; estimates parameters to identify clusters.
Spectral Clustering
Graph-based clustering technique that uses the eigenvalues of a similarity matrix to partition data into clusters.