Clustering

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/16

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

17 Terms

New cards

PCA and Clustering similarities

both trying to simplify the data

New cards

classification

group data into specific categories using labeled dataset

New cards

clustering

group similar data instances together using unlabeled dataset, clusters are not necessarily aligned with classification

New cards

PCA vs clustering

pca maximizes the variance of PC scores

New cards

clustering

minimizes within cluster variance

New cards

PCA

maximize the variance of PC scores

New cards

Euclidean distance

used for continuous data

New cards

mahalanobis

adjusts for variable correlations

New cards

k means clustering

minimize within cluster variation

New cards

centroid

the vector the p feature means for the observation

New cards

k mediods

choose data points as the centers (mediods)

New cards

mediods

most centrally located point in the cluster

New cards

k modes clustering optimal k

have a scree plot of dissimilarities and find the elbow point.

New cards

k-modes procedure

count dissimilarities

New cards

k- prototype

designed to handle mixed datasets (both numerical and categorical) combines k-means and k-modes

New cards

Silhouette

measures how similar an object is to its own cluster compared to other clusters.

New cards

silhouette range

-1 to +1