1/21
Data Mining
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Clustering
The process of grouping data points into clusters where points in the same group are more similar to each other than to those in other groups.
Partitional Clustering
Divides data into non-overlapping subsets without any hierarchy. Example: K-means.
Hierarchical Clustering
Builds a nested tree of clusters (dendrogram) using bottom-up (agglomerative) or top-down (divisive) approaches.
Dendrogram
A tree-like diagram that visualizes how clusters are formed or split in hierarchical clustering.
Agglomerative Clustering
A bottom-up method that starts with each point as a cluster and merges them step-by-step.
Divisive Clustering
A top-down method that starts with one cluster and splits it recursively.
Proximity Matrix
A matrix that stores the pairwise distances or similarities between data points or clusters.
Single Link (MIN)
Measures cluster similarity based on the closest pair of points between two clusters.
Complete Link (MAX)
Measures similarity based on the farthest pair of points between two clusters.
Group Average Link
Measures similarity based on the average distance between all pairs of points in two clusters.
Centroid Distance
Distance between the centroids (mean points) of two clusters.
Ward’s Method
Measures similarity by the increase in squared error resulting from merging two clusters.
DBSCAN
A density-based clustering algorithm that groups points based on dense regions separated by low-density regions.
Eps (ε)
Radius parameter in DBSCAN used to define the neighborhood around a point.
MinPts
Minimum number of points required in an Eps-neighborhood to consider a point a core point in DBSCAN.
Core Point
A point with MinPts or more neighbors within radius Eps.
Border Point
A point that is not a core, but lies within Eps of a core point.
Noise Point
A point that is neither core nor border; considered an outlier.
k-distance Plot
A plot used to help determine a good value for Eps in DBSCAN by plotting the distance to each point’s k-th nearest neighbor.
Cluster Validity
Evaluation of how meaningful, compact, and well-separated the resulting clusters are.
Limitations of Hierarchical Clustering
Irreversible merges/splits; lacks a global objective; sensitive to noise and cluster shape.
Strength of DBSCAN
Can handle arbitrary-shaped clusters and detect noise without needing to define the number of clusters.