K-means clustering

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/11

There's no tags or description

Looks like no tags are added yet.

Last updated 5:58 AM on 5/1/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

12 Terms

New cards

Characteristics of the Input Data are Important

Dimensionality

• Noise and Outliers

• Type of Distribution

• Type of Data / Attributes – dictates type of similarity

New cards

Data: Pre-processing

– Normalize Data

– Eliminate Outliers

New cards

Data: Post-processing

– Eliminate small clusters that may represent outliers

– Split “loose” clusters; i.e., clusters with relatively high SSE

– Merge clusters that are “close” and that have relatively low SSE

– Can use these steps during the clustering process

New cards

K – Means Clustering

Given a value of k, the k-means algorithm randomly assigns each observation to one of the k clusters.

• After all observations have been assigned to a cluster, the resulting cluster centroids are calculated.

• Using the updated cluster centroids, all observations are reassigned to the cluster with the closest centroid

New cards

How to choose “k”?

Choose “k” based on how results will be used

– Example: “How many market segments do we want?”

• Also, experiment with slightly different k’s

If the no of clusters, k, is not clearly established by the context of the business problem, the k-means algorithm can be repeated for several values of k to identify promising values.

New cards

Suitability of k-Means Clustering

Suitable when you know how many clusters you want and you have a larger data set (e.g., more than 500 observations)

• This method is appropriate for larger tables upto millions of rows and allows only numerical data.

• Partitions the observations, which is appropriate if trying to summarize the data with k “average” observations that describe the data with the minimum amount of error.

New cards

Clustering should result in groups….

made of observations that are more similar too each other than they are to observations in other clusters.

New cards

Cluster cohesion

relates to the distance between observations within the same cluster.

New cards

Cluster separation

relates to the distance between observations in different clusters.

New cards

Cluster interpretability

relating to how much insight clusters provide.

New cards

Cluster stability

referring to how robust is the set of clusters with respect to slight changes in the data

New cards

Clustering is an…

UNSUPERVISED TECHNIQUE