1/16
These flashcards cover key vocabulary and concepts related to data clustering, providing definitions and explanations of important terms.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Clustering
The process of organizing objects into groups whose members are similar in some way.
Cluster Analysis
A statistical method used to classify objects into distinct subgroups based on similarity.
Data Clustering
An unsupervised learning problem where the goal is to group examples into K partitions based on similarity.
High within-cluster similarity
Indicates that objects in the same cluster are very similar to each other.
Low inter-cluster similarity
Indicates that objects in different clusters are very dissimilar to each other.
Similarity
The quality or state of being similar; likeness; resemblance.
Distance Metrics
Functions that define a measure of distance between two data points, such as Euclidean or Manhattan distance.
Hierarchical Clustering
A clustering method that creates a hierarchy of clusters using either agglomerative or divisive approaches.
Agglomerative Clustering
A bottom-up approach to clustering where each data point starts as a single cluster.
K-means Clustering
A method that partitions data into K distinct clusters based on means of the data points in each cluster.
Gaussian Mixture Models (GMM)
A probabilistic model that assumes all data points are generated from a mixture of K Gaussian distributions.
Spectral Clustering
A clustering technique that uses the eigen-decomposition of similarity matrices to group data.
DBSCAN
A density-based clustering algorithm that defines clusters as regions of high point density.
Silhouette Score
A metric that quantifies how well each point lies within its cluster versus the next closest cluster.
Within-Cluster Sum of Squares (WCSS)
A measure of the total variance within clusters, often used to evaluate the quality of clustering.
Chebyshev Distance
A distance metric defined as the maximum absolute difference in any dimension between two points.
Cosine Similarity
A similarity measure that calculates the cosine of the angle between two non-zero vectors, often used in high-dimensional spaces.