DM UNIT 5 : Types of Clustering & DBSCAN

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/21

flashcard set

Earn XP

Description and Tags

Data Mining

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

22 Terms

1
New cards

Clustering

The process of grouping data points into clusters where points in the same group are more similar to each other than to those in other groups.

2
New cards

Partitional Clustering

Divides data into non-overlapping subsets without any hierarchy. Example: K-means.

3
New cards

Hierarchical Clustering

Builds a nested tree of clusters (dendrogram) using bottom-up (agglomerative) or top-down (divisive) approaches.

4
New cards

Dendrogram

A tree-like diagram that visualizes how clusters are formed or split in hierarchical clustering.

5
New cards

Agglomerative Clustering

A bottom-up method that starts with each point as a cluster and merges them step-by-step.

6
New cards

Divisive Clustering

A top-down method that starts with one cluster and splits it recursively.

7
New cards

Proximity Matrix

A matrix that stores the pairwise distances or similarities between data points or clusters.

8
New cards

Single Link (MIN)

Measures cluster similarity based on the closest pair of points between two clusters.

9
New cards

Complete Link (MAX)

Measures similarity based on the farthest pair of points between two clusters.

10
New cards

Group Average Link

Measures similarity based on the average distance between all pairs of points in two clusters.

11
New cards

Centroid Distance

Distance between the centroids (mean points) of two clusters.

12
New cards

Ward’s Method

Measures similarity by the increase in squared error resulting from merging two clusters.

13
New cards

DBSCAN

A density-based clustering algorithm that groups points based on dense regions separated by low-density regions.

14
New cards

Eps (ε)

Radius parameter in DBSCAN used to define the neighborhood around a point.

15
New cards

MinPts

Minimum number of points required in an Eps-neighborhood to consider a point a core point in DBSCAN.

16
New cards

Core Point

A point with MinPts or more neighbors within radius Eps.

17
New cards

Border Point

A point that is not a core, but lies within Eps of a core point.

18
New cards

Noise Point

A point that is neither core nor border; considered an outlier.

19
New cards

k-distance Plot

A plot used to help determine a good value for Eps in DBSCAN by plotting the distance to each point’s k-th nearest neighbor.

20
New cards

Cluster Validity

Evaluation of how meaningful, compact, and well-separated the resulting clusters are.

21
New cards

Limitations of Hierarchical Clustering

Irreversible merges/splits; lacks a global objective; sensitive to noise and cluster shape.

22
New cards

Strength of DBSCAN

Can handle arbitrary-shaped clusters and detect noise without needing to define the number of clusters.