Ch9: Clustering

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

13 Terms

New cards

What is clustering in data mining?

A: An unsupervised learning method used to automatically group data points into clusters based on similarity, without using labeled data.

New cards

What are common use cases of clustering?

Organizing documents
Customer segmentation
Web search result grouping
Image segmentation
Shopping pattern discovery

New cards

How is an instance typically represented in clustering?

A: As a point in n-dimensional space:

x=⟨ a₁(x),a₂(x),...,a_n(x) ⟩

New cards

What is a common distance metric for measuring similarity in clustering?

Euclidean distance, defined as:

New cards

What is the K-means clustering algorithm?

A: An iterative algorithm that partitions data into K clusters by minimizing the distance between data points and their assigned cluster centroids.

New cards

What are the steps of the K-means algorithm?

Initialize K cluster centers
Assign each point to the nearest cluster
Update each cluster center to be the mean of its assigned points
Repeat steps 2–3 until assignments no longer change

New cards

What does K-means minimize during training?

A: The sum of squared distances from each data point to its assigned cluster center (within-cluster variance).

New cards

Is K-means guaranteed to converge?

A: Yes — it converges in a finite number of steps, though not necessarily to a global optimum.

New cards

What is the computational complexity(running time) of K-means?

O(k⋅n) for assigning points to clusters
O(n) for updating cluster means

New cards

What distance properties must the similarity measure satisfy?

Symmetry: D(A,B)=D(B,A)
Positivity: D(A,B)>0 and D(A,B)=0 if A=B
Triangle inequality: D(A,C)≤D(A,B)+D(B,C)

New cards

How is the number of clusters K determined?

It is not well defined and often problem-dependent.
A common heuristic: look for a "kink" or elbow in the objective function graph

New cards

What are the main advantages of K-means clustering?

Simple and intuitive
Efficient on large datasets
Guaranteed to converge

New cards

What are the main disadvantages of K-means?

Sensitive to outliers and noise
Struggles with non-circular (non-convex) clusters
May converge to local optimum
Requires pre-specifying K