1/12
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Clustering Tendency
Before running a clustering algorithm, evaluate whether a data set has a cluster-like structure, using statistical tests for spatial randomness for data in Euclidean space ( low dim )
Hopkins Statistic
A method used to measure clustering tendency by checking to see if the data is more clustered than random. M points are randomly generated in the data space, then m real points are chosen from the data. Them for each random/real point, find the distance to the nearest respective data point.
Similarity Matrix
A matrix used to cluster data and sort points according to their respective cluster assignments.
Internal Measures
A way to evaluate how effective a clustering is without using any external labels (unsupervised validation); the quality of the data is solely evaluated based on the data and the cluster assignments by checking how compact the clusters are and how separated individual clusters are from one another.
Cluster Cohesion
How closely related are objects within a cluster (WSS).
Cluster Separation
How distinct a cluster is from other clusters (BSS).
Total Sum of Squares
A measurement that evaluates the overall spread of data around the global centroid by summing WSS and BSS.
Silohouette Coefficient
An internal measure that combines cohesion and separation metrics to evaluate how well each point fits within its assigned cluster.
Density Based Cluster Validation (DBCV)
A cluster validity index used for DBSCAN; the sparsetest part inside a cluster should always be denser than the densest region between clusters. A high DBCV value indicates that clusters are properly separated by low-density regions.
Cophenetic Distance
The proximity at which the agglomerative clustering put them in the same cluster
Cophenetic Correlation Coefficient (CPCC)
Correlation between cophenetic distance matrix and the proximity matrix of the original data points.
External Validation
Methods to evaluate clustering when the class labels are available by comparing cluster assignments to the true class labels. Impurity, precision, recall, and F-measure are used as classification measures.
Relative Cluster Validation
Comparing clustering results collectively by using a validity measure to compare two or more clustering solutions to decide which is better. Examples include comparing different clustering algos, choosing the best number of clusters, comparing two specific clusters, and evaluating individual points.