1/9
These flashcards cover the fundamental concepts, methods, and metrics related to clustering techniques in data analysis.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Clustering
The use of unsupervised techniques for grouping similar objects.
Cluster
A collection of records that are similar to each other within the cluster, but dissimilar to records in other clusters.
K-Means Clustering
A clustering method that partitions a dataset into k clusters based on the closest proximity to the cluster mean (centroid).
Centroid
The center or mean of a cluster in K-means clustering.
Euclidean Distance
A measure of the straight-line distance between two points in Euclidean space.
Inertia
A measurement used in K-Means to quantify how well a dataset was clustered, calculated as the sum of squared distances from each point to its closest cluster center.
Silhouette Coefficient
A measure that assesses the quality of clustering, measuring how similar a point is to its own cluster compared to other clusters, with values ranging from -1 to 1.
Elbow Method
A technique used to determine the optimal number of clusters in K-Means by identifying the point where the Within-Cluster-Sum of Squared Errors (WSS) begins to decrease sharply.
Statistical Distance (Mahalanobis distance)
A distance measure that accounts for correlations between measurements, allowing for a more nuanced understanding of the data's structure.
Manhattan Distance
A measure of distance that computes the sum of absolute differences between points in a grid-like path.