1/17
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is clustering in data mining?
The process of grouping similar items (data points) in a dataset without predefined labels.
How is clustering different from classification?
Clustering is an unsupervised learning technique without predefined labels, while classification is supervised learning with labeled data.
What is the primary goal of clustering?
To group data points so that objects within the same cluster are highly similar, while objects in different clusters are as dissimilar as possible.
What are the two main Types of Clustering Approaches
Hard clustering and soft/fuzzy clustering.
What is hard clustering?
Each data point belongs to exactly one cluster
What is soft/fuzzy clustering?
Data points may belong to multiple clusters with varying degrees of membership
What is partitional clustering?
A clustering approach Divides data into non-overlapping subsets (fixed number of clusters)
What is hierarchical clustering?
A clustering approach that creates a hierarchy of clusters
What is agglomerative hierarchical clustering?
(bottom-up): Starts with individual points as clusters and merges them
What is a dendrogram?
A tree-like diagram that shows the hierarchical relationship between clusters in hierarchical clustering.
What is K-means clustering?
A partitioning method that divides data into k distinct clusters based on distance to the centroid of each cluster.
What is the objective function of K-means?
To minimize the sum of squared distances between data points and their cluster centers: J(V) = Σ Σ ||xi - μj||².
Describe the K-means process.
1) Select initial k cluster centers,
2) Allocate each data point to the nearest cluster center,
3) Recompute cluster centers as the average of assigned points,
4) Repeat until convergence.
How do you determine the optimal number of clusters?
By using validity indices that assess how good the clusters are based on data dispersion within and between clusters.
Give an example of how clustering might be used in healthcare.
Clustering could be used to identify groups of patients with similar symptoms or disease progression patterns, helping with personalized treatment planning.
What is Divisive hierarchical clustering?
(top-down): 'Starts with all data in one cluster and splits recursively
What is PAM (Partitioning Around Medoids) and how does if differ to K-Means?
Similar to K-means but uses actual data points as cluster centers
More robust to outliers than K-means
What is Fuzzy c-Means
Allows data points to belong to multiple clusters with degrees of membership