ML Exam 2 : Module 15 - Cluster Validation

0.0(0)

Studied by 1 person

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Last updated 5:30 AM on 4/1/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

New cards

Clustering Tendency

Before running a clustering algorithm, evaluate whether a data set has a cluster-like structure, using statistical tests for spatial randomness for data in Euclidean space ( low dim )

New cards

Hopkins Statistic

A method used to measure clustering tendency by checking to see if the data is more clustered than random. M points are randomly generated in the data space, then m real points are chosen from the data. Them for each random/real point, find the distance to the nearest respective data point.

New cards

Similarity Matrix

A matrix used to cluster data and sort points according to their respective cluster assignments.

New cards

Internal Measures

A way to evaluate how effective a clustering is without using any external labels (unsupervised validation); the quality of the data is solely evaluated based on the data and the cluster assignments by checking how compact the clusters are and how separated individual clusters are from one another.

New cards

Cluster Cohesion

How closely related are objects within a cluster (WSS).

New cards

Cluster Separation

How distinct a cluster is from other clusters (BSS).

New cards

Total Sum of Squares

A measurement that evaluates the overall spread of data around the global centroid by summing WSS and BSS.

New cards

Silohouette Coefficient

An internal measure that combines cohesion and separation metrics to evaluate how well each point fits within its assigned cluster.

New cards

Density Based Cluster Validation (DBCV)

A cluster validity index used for DBSCAN; the sparsetest part inside a cluster should always be denser than the densest region between clusters. A high DBCV value indicates that clusters are properly separated by low-density regions.

New cards

Cophenetic Distance

The proximity at which the agglomerative clustering put them in the same cluster

New cards

Cophenetic Correlation Coefficient (CPCC)

Correlation between cophenetic distance matrix and the proximity matrix of the original data points.

New cards

External Validation

Methods to evaluate clustering when the class labels are available by comparing cluster assignments to the true class labels. Impurity, precision, recall, and F-measure are used as classification measures.

New cards

Relative Cluster Validation

Comparing clustering results collectively by using a validity measure to compare two or more clustering solutions to decide which is better. Examples include comparing different clustering algos, choosing the best number of clusters, comparing two specific clusters, and evaluating individual points.