CAP 4770 - Lecture 5: Clustering

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

Flashcards covering key concepts, algorithms (K-means, Hierarchical, DBSCAN), applications, and limitations related to cluster analysis from the lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

Cluster Analysis

Finding groups of objects such that objects in a group are similar to one another and different from objects in other groups, maximizing inter-cluster distances and minimizing intra-cluster distances.

2
New cards

Inter-cluster distances

Distances between different clusters, which are maximized in cluster analysis.

3
New cards

Intra-cluster distances

Distances between objects within the same cluster, which are minimized in cluster analysis.

4
New cards

Partitional Clustering

A type of clustering that divides data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.

5
New cards

Hierarchical Clustering

A type of clustering that produces a set of nested clusters organized as a hierarchical tree.

6
New cards

Center-Based Cluster

A set of objects where an object in the cluster is closer (more similar) to the 'center' of its cluster than to the center of any other cluster.

7
New cards

Centroid

The average of all the points in a continuous cluster.

8
New cards

Medoid

The most 'representative' point of a categorical cluster.

9
New cards

K-means Clustering

A partitional clustering approach where each cluster is associated with a centroid, and each point is assigned to the cluster with the closest centroid. The number of clusters, K, must be specified as an input parameter.

10
New cards

Initial Centroids (K-means)

Often randomly chosen data points that serve as the starting centers for K-means clusters, influencing the final clustering result.

11
New cards

Sum of Squared Error (SSE)

A measure used in K-means clustering, calculated by squaring the distance of each point to its nearest cluster centroid and summing these errors; the algorithm aims to minimize SSE.

12
New cards

K-means convergence

The process by which the K-means algorithm settles, typically in the first few iterations, on a stable set of cluster centroids.

13
New cards

Vector quantization

An application of K-means clustering used for lossy data compression, such as clustering colors in an image.

14
New cards

K-means limitations

Challenges for K-means when clusters have differing sizes, densities, non-globular shapes, or when the data contains outliers.

15
New cards

Pre-processing (Clustering)

Steps taken before clustering, such as normalizing data and eliminating outliers, to improve clustering results.

16
New cards

Post-processing (Clustering)

Refining steps taken after clustering, such as eliminating small clusters, splitting 'loose' clusters, or merging 'close' clusters.

17
New cards

Dendrogram

A tree-like diagram used to visualize hierarchical clustering, recording the sequences of merges or splits.

18
New cards

Agglomerative Clustering

A hierarchical clustering technique that starts with individual points as clusters and iteratively merges the two closest clusters until a single cluster remains.

19
New cards

Divisive Clustering

A hierarchical clustering technique that starts with one all-inclusive cluster and iteratively splits clusters until each cluster contains a single point.

20
New cards

Density-Based Cluster

A cluster defined as a dense region of points separated by low-density regions from other high-density regions, useful for irregular shapes, noise, and outliers.

21
New cards

DBSCAN

Density Based Spatial Clustering of Applications with Noise, an algorithm that defines a cluster as a maximal set of density-connected points, using parameters Eps (radius) and MinPts (minimum points).

22
New cards

Eps (DBSCAN parameter)

The specified radius used in DBSCAN to determine the neighborhood of a point, within which other points are counted for density.

23
New cards

MinPts (DBSCAN parameter)

The specified minimum number of points required within Eps for a point to be considered a core point in DBSCAN.

24
New cards

Core point (DBSCAN)

A data point that has more than MinPts within its Eps neighborhood, indicating it is at the interior of a cluster.

25
New cards

Border point (DBSCAN)

A data point that has fewer than MinPts within its Eps neighborhood but is within the Eps neighborhood of a core point.

26
New cards

Noise point (DBSCAN)

Any data point that is neither a core point nor a border point in DBSCAN.

27
New cards

Data structures for clustering efficiency

Techniques like k-d trees and R-trees proposed to improve the efficiency of distance computations in clustering algorithms.