1/9
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is clustering?
Clustering is the organization of unlabeled data into similarity groups called clusters.
What are the three key components needed for clustering?
Proximity measure, criterion function, and an algorithm to compute clustering.
What historic application of clustering is mentioned in the notes?
John Snow's mapping of cholera deaths in the 1850s during an outbreak.
What does K-means clustering involve?
K-means clustering partitions data into k clusters with each having a centroid.
How does the K-means algorithm compute clusters?
It chooses initial centroids, assigns points to the closest centroid, and re-computes centroids iteratively.
What is the convergence criterion in K-means?
Convergence is reached when there are no re-assignments of points to different clusters or minimal change in centroids.
What are some strengths of K-means clustering?
It is simple, efficient, and has a time complexity of O(tkn), where n is the number of data points, k is the number of clusters, and t is the number of iterations.
What are weaknesses of the K-means algorithm?
It requires pre-specifying k, is sensitive to outliers, and is not applicable to categorical data without modifications.
What are the two types of hierarchical clustering mentioned?
Divisive (top-down) and agglomerative (bottom-up) clustering.
How is agglomerative hierarchical clustering performed?
It merges the two nearest clusters iteratively until all are combined into a single cluster.