6 Clustering

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/10

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

11 Terms

1
New cards

What is clustering, and how is it used in unsupervised learning? Provide examples of its applications.

Unsupervised learning technique to group data points based on similarity. Ex: retail company wants to segment its customers into distinct groups based on their shopping behavior to design targeted marketing campaigns.

2
New cards

Explain the primary goals of clustering. What is meant by maximizing intra-cluster similarity and minimizing inter-cluster similarity?

Aims to group data points into clusters.

Maximizing Intra-Cluster Similarity: The goal is to ensure that points within a cluster are homogeneous or closely related.

Minimizing Inter-Cluster Similarity: The goal is to maximize the distance or dissimilarity between clusters, ensuring that each cluster is distinct.

3
New cards

Describe the key differences between hierarchical and partitional clustering. Provide an example of each.

Hierarchical: Builds a tree of clusters. Ex: A biologist wants to group genes based on their expression levels across different conditions to understand genetic relationships.

Partitional: Divides data into fixed k clusters (e.g., K-Means). Ex: A retail company wants to divide its customer base into distinct groups to design targeted marketing campaigns.

4
New cards

Name and explain three common distance metrics used in clustering algorithms. In what scenarios would each metric be preferred?

Euclidean: For geometric distances.

Manhattan: For grid-like data.Cosine

Similarity: For textual data.

5
New cards

What is hierarchical clustering? Differentiate between agglomerative (bottom-up) and divisive (top-down) approaches.

Agglomerative Bottom-up: Starts with individual points and merges clusters. DivisiveTop-down: Starts with one cluster and splits it.

6
New cards

What is a dendrogram, and how is it used in hierarchical clustering? How can you determine the number of clusters from a dendrogram?

Tree diagram representing cluster merges. Cut at a level to define clusters.

7
New cards

Outline the steps of the K-Means clustering algorithm. Provide an example to illustrate the process.

Initialize centroids.

Assign points to nearest centroid.

Update centroids.

Repeat until convergence.

Ex: A retail company wants to segment its customers into distinct groups based on their purchasing behavior to design targeted marketing strategies.

8
New cards

How does the Elbow Method help determine the optimal number of clusters in K-Means clustering? Explain with a diagram.

Use the Elbow Method: Plot cost vs. k and select the "elbow" point.

9
New cards

How do clustering algorithms handle outliers? Discuss the strengths and weaknesses of K-Means and DBSCAN in this context.

K-Means: Sensitive to outliers.DBSCAN:

Robust to outliers.

10
New cards

Explain the curse of dimensionality and its impact on clustering algorithms. What techniques can be used to address this issue?

High dimensions dilute clustering quality. Use PCA or t-SNE to reduce dimensions.

11
New cards

You are tasked with grouping customers based on purchasing behavior. How would you apply clustering to solve this problem? Discuss the preprocessing steps and the choice of algorithm.

For customer segmentation, preprocess data (normalize, remove noise), and use K-Means or DBSCAN.