lecture 9

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/17

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

What is clustering in data mining?

The process of grouping similar items (data points) in a dataset without predefined labels.

New cards

How is clustering different from classification?

Clustering is an unsupervised learning technique without predefined labels, while classification is supervised learning with labeled data.

New cards

What is the primary goal of clustering?

To group data points so that objects within the same cluster are highly similar, while objects in different clusters are as dissimilar as possible.

New cards

What are the two main Types of Clustering Approaches

Hard clustering and soft/fuzzy clustering.

New cards

What is hard clustering?

Each data point belongs to exactly one cluster

New cards

What is soft/fuzzy clustering?

Data points may belong to multiple clusters with varying degrees of membership

New cards

What is partitional clustering?

A clustering approach Divides data into non-overlapping subsets (fixed number of clusters)

New cards

What is hierarchical clustering?

A clustering approach that creates a hierarchy of clusters

New cards

What is agglomerative hierarchical clustering?

(bottom-up): Starts with individual points as clusters and merges them

New cards

What is a dendrogram?

A tree-like diagram that shows the hierarchical relationship between clusters in hierarchical clustering.

New cards

What is K-means clustering?

A partitioning method that divides data into k distinct clusters based on distance to the centroid of each cluster.

New cards

What is the objective function of K-means?

To minimize the sum of squared distances between data points and their cluster centers: J(V) = Σ Σ ||xi - μj||².

New cards

Describe the K-means process.

1) Select initial k cluster centers,

2) Allocate each data point to the nearest cluster center,

3) Recompute cluster centers as the average of assigned points,

4) Repeat until convergence.

New cards

How do you determine the optimal number of clusters?

By using validity indices that assess how good the clusters are based on data dispersion within and between clusters.

New cards

Give an example of how clustering might be used in healthcare.

Clustering could be used to identify groups of patients with similar symptoms or disease progression patterns, helping with personalized treatment planning.

New cards

What is Divisive hierarchical clustering?

(top-down): 'Starts with all data in one cluster and splits recursively

New cards

What is PAM (Partitioning Around Medoids) and how does if differ to K-Means?

Similar to K-means but uses actual data points as cluster centers
More robust to outliers than K-means

New cards

What is Fuzzy c-Means

Allows data points to belong to multiple clusters with degrees of membership