1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
PCA and Clustering similarities
both trying to simplify the data
classification
group data into specific categories using labeled dataset
clustering
group similar data instances together using unlabeled dataset, clusters are not necessarily aligned with classification
PCA vs clustering
pca maximizes the variance of PC scores
clustering
minimizes within cluster variance
PCA
maximize the variance of PC scores
Euclidean distance
used for continuous data
mahalanobis
adjusts for variable correlations
k means clustering
minimize within cluster variation
centroid
the vector the p feature means for the observation
k mediods
choose data points as the centers (mediods)
mediods
most centrally located point in the cluster
k modes clustering optimal k
have a scree plot of dissimilarities and find the elbow point.
k-modes procedure
count dissimilarities
k- prototype
designed to handle mixed datasets (both numerical and categorical) combines k-means and k-modes
Silhouette
measures how similar an object is to its own cluster compared to other clusters.
silhouette range
-1 to +1