6 - Unsupervised Learning

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/64

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

65 Terms

New cards

Uses of unsupervised learning techniques

Exploratory data analysis especially on high dimensional datasets

Feature generation

New cards

PCA

To transform a high-dimensional dataset into a smaller, much more manageable set of representative (“principal”) variables, easier to explore and visualize

New cards

PCs are…

linear combinations of the existing variables that capture most of the information in the original dataset

New cards

PCA is especially useful for

highly correlated data, for which a few PCs are enough to represent most of the information in the full dataset

New cards

PCA score formula

New cards

Choose PC loadings to…

capture as much information the original dataset as possible

New cards

Goal of PCA for calculations

to maximize the sample variance of Z1

New cards

Sample variance formula

New cards

Orthogonality constraints

New cards

Why are orthogonality constraints needed

so the PCs measure different aspects of the variables in the dataset

New cards

PCA analysis is constrained by

the line that is as close as possible to the observations such that minimizes the sum of the squared perpendicular distances between each data point and the line

New cards

First and second PCs are…

Mutually perpendicular

New cards

Scores formula with notation

New cards

Total variance for a feature

New cards

Variance explained by the mth PC

New cards

PVE =

New cards

Does Centering have an effect on PC loadings? Why?

Centering has no effect on PC loadings because the variance remains unchanged upon centering and this variance maximization is how PC loadings are defined

New cards

Does scaling have an effect on PC loadings? Why?

Yes, If variables are of vastly different orders of magnitude, then variables with an unusually large variance on their scale will receive a large PC loading and dominate the corresponding PC

New cards

Drawbacks of PCA

Interpretability: Cannot make sense of the PCs because they are complicated linear combinations of the original features
Not good for non-linear relationships
- Uses linear transformations to summarize and visualize high-dimensional datasets where the variables are highly linearly correlates
PCA is not doing feature selection because all variables go into the components, no operational efficiency added
Target variable is ignored: Assuming directions in which the features exhibit the most variation are also the directions most associated with the target variable, no guarantee this is true

New cards

Scree plot

PVEs against PC index

New cards

Are PC loadings unique

Yes, up to a sign flip

New cards

Biplot shows what?

the locations of PCAs and scores

New cards

How does having categorical variables with high dimensionality hurt a data set?

Predictive power of the model will be diluted and lead to sparse factor levels (those with very few observations)

New cards

Suggest 2 ways to transform categorical variables with high dimensionality to retain them

Combine categories into smaller groups

Binarize the two factor variables, run a PCA on each set of dummy variables and use the first few PCs to summarize most of the information

New cards

Total SS =

within cluster SS + between cluster SS

New cards

Total SS is

the total variation of all the observations in the data without any clustering (essentially there is one large cluster containing all observations)

New cards

Between cluster SS

Can be thought of as the SS explained by the K clusters

New cards

Two idealistic goals of cluster analysis

Homogeneity: Want observations within each cluster to share characteristics while observations in different clusters are different from one another
Interpretability: Characteristics of the clusters are typically interpretable and meaningful within the context of the business problem

New cards

PCA/Clustering Similarities

Unsupervised Learning
Simplify the data by a small number of summaries

New cards

PCA/Clustering Difference

PCAs find low dim representation whereas clustering finds homogeneous subgroups among the obs

New cards

K-means clustering algorithm

Randomly select k points in the feature space as the initial cluster centers
Assign each obs to closest cluster in terms of Euclidean distance
Recalc center of each cluster
Repeat until nothing changes

New cards

Why we run K means algorithm multiple times

To mitigate the randomness associated with the initial cluster centers and increase the chance of identifying a global optimum and getting more representative cluster groups

New cards

Is K means clustering a global optimum

No, local

New cards

Hierarchical Clustering

Series of fusions of observations

New cards

Hierarchical clustering algorithm

Start with all separate clusters
Fuse closest pair one at a time
- Repeat until all clusters are fused into a single cluster containing all obs

New cards

Within cluster variation vs Euclidian distance

Within cluster is squared for each observation whereas Euclidean distance uses a square root

New cards

Elbow method

Choose cutoff where the proportion of variance explained by the k number of clusters reaches the elbow in the graph

New cards

Linkage choices

Complete, single, average, centroid

New cards

Complete linkage

New cards

Single linkage

Minimal pairwise distance

New cards

Average linkage

Average of all pairwise distances

New cards

Centroid linkage

Distance between the two cluster centroids (or centers)

New cards

Most common linkage methods? Why?

Complete and average because they result in more balanced and visually appealing clusters

New cards

Dendrogram

an upside down tree that shows dissimilarity at each fusion

New cards

Lower cut dendrogram results in _____ clusters

New cards

Differences of K means and hierarchical

Randomization

Pre-specified number of clusters

Nested clusters

New cards

Which of K means and hierarchical need randomization?

K means

New cards

Which of K means and hierarchical need pre-specified clusters?

K means

New cards

Which of K means and hierarchical need nested clusters?

Hierarchical

New cards

Similarities of K means and hierarchical

Both unsupervised
Objective is to uncover homogeneous subgroups among the observations
Both are sensitive to scaling of variables
Both are sensitive to outliers

New cards

Solution for observations with largely different scales

correlation based distance

New cards

Ways to generate features from cluster analysis

Cluster groups
Cluster centers can replace the original variables for interpretation and prediction purposes

New cards

Two impacts of Curse of dimensionality for clustering

Harder to visualize data

Notion of closeness becomes more fuzzy when there are more and more variables

New cards

Which linkage can results in inversion

Central linkage

New cards

Considerations for number of clusters to choose for hierarchical clustering

Balance

Height differences

New cards

Explain two reason why unsupervised learning is often more challenging than supervised learning

Less clearly defined objectives

Less objective evaluation

New cards

Why are PC loading vectors unique up to a sign flip

The line of the PC extends in both directions and therefore gives rise to another valid PC loading vector

New cards

Explain how scaling the variables will effect the results of clustering

Unscaled variables might have one dominate the distance calculations and exert a disproportionate impact on the cluster arrangements, so we adjust for that

New cards

Explain how principal components analysis can be used as a pre-processing step before applying clustering to a high-dimensional dataset

PCA can allow us to compress the data into 2 dimensions without losing much information and to visualize the cluster assignments in a two-dimensional scatterplot using the scores of the first two PCs

New cards

Large variance for PC1 and small others imply what?

Strong correlation among the variables

New cards

What is the name of the plot for K means clustering

Elbow plot

New cards

When to select for an elbow plot

When it levels off

New cards

State the difference between dissimilarity and linkage

Dissimilarity measures the proximity of two observations in the data set, while linkage measures the proximity of two clusters of observations

New cards

Describe the steps to calculate the within cluster sum of squares using latitude and longitude

Calc the centroid

Calc sq Euclidean distance between each city and the respective centroid

Sum all sqd distances

New cards

What distance method does K means clustering and hierarchical clustering use

Euclidean