Clustering-Intro

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/9

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

10 Terms

New cards

What is clustering?

Clustering is the organization of unlabeled data into similarity groups called clusters.

New cards

What are the three key components needed for clustering?

Proximity measure, criterion function, and an algorithm to compute clustering.

New cards

What historic application of clustering is mentioned in the notes?

John Snow's mapping of cholera deaths in the 1850s during an outbreak.

New cards

What does K-means clustering involve?

K-means clustering partitions data into k clusters with each having a centroid.

New cards

How does the K-means algorithm compute clusters?

It chooses initial centroids, assigns points to the closest centroid, and re-computes centroids iteratively.

New cards

What is the convergence criterion in K-means?

Convergence is reached when there are no re-assignments of points to different clusters or minimal change in centroids.

New cards

What are some strengths of K-means clustering?

It is simple, efficient, and has a time complexity of O(tkn), where n is the number of data points, k is the number of clusters, and t is the number of iterations.

New cards

What are weaknesses of the K-means algorithm?

It requires pre-specifying k, is sensitive to outliers, and is not applicable to categorical data without modifications.

New cards

What are the two types of hierarchical clustering mentioned?

Divisive (top-down) and agglomerative (bottom-up) clustering.

New cards

How is agglomerative hierarchical clustering performed?

It merges the two nearest clusters iteratively until all are combined into a single cluster.