SRM Exam Chapter 6: Unsupervised Learning

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/60

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

61 Terms

New cards

Principal Components Analysis (PCA):
z, Z

Principal component (score)

New cards

Principal Components Analysis (PCA):
Subscript m

Index for principal components

New cards

Principal Components Analysis (PCA):
𝜙

Principal component loading

New cards

Principal Components Analysis (PCA):
x, X

Centered explanatory variable

New cards

Unsupervised Learning

PCA
Clustering

New cards

PCA

Reduces the dimensions of a dataset

New cards

PCs are _______________ of all predictors in the dataset.

weighted averages

New cards

The first PC, z_1, explains the ______________ amount of variability in the dataset.

largest

New cards

The weights of PCs are called _____________, 𝜙.

loadings

New cards

The second PC, z_2, explains the ______________ amount of variability not explained by ________________ in the dataset.

largest
z_1

New cards

The first principal component is the direction along which the data ________________. Then the second principal component must be _________________ to the first.

varies the most
perpendicular

New cards

Biplot:
Horizontal axis are for the ____________ PCs.
Vertical axis are for the ______________ PCs.

first
second

New cards

Scree plot shows the __________________________ by each PC. The proportion should ________________ from one PC to the next since each PC should explain a smaller amount of variance than the previous PC. Use this plot to decide how many PCs explain a ______________________ in the data.

proportion of variance explained
decrease
sufficient amount of variability

New cards

Principal Components:
𝑧_m =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(j) }

New cards

Principal Components:
𝑧_(i,m) =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(i,j) }

New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m)^(2) } =

= 1

New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m) • 𝜙_(j,u)} =
m _____ u

= 0
≠

New cards

Proportion of Variance Explained (PVE):
∑ (j=1 to p) 𝑠_(x_ j)^(2) =

= ∑ (j=1 to p) {1 / (𝑛 − 1)} • ∑ {𝑥_(i,j)^(2)}

New cards

Proportion of Variance Explained (PVE):
𝑠_(z_m)^(2) =

= [1 / (𝑛 − 1)] ∑ 𝑧_(i,m)^(2)

New cards

Proportion of Variance Explained (PVE):
PVE =

= 𝑠_(z_m)^(2) / ∑ (j=1 to p) 𝑠_(x_ j)^(2)

New cards

Proportion of Variance Explained (PVE):
The variance explained by each subsequent principal component is always ______________ than the variance explained by the previous principal component.

less

New cards

Proportion of Variance Explained (PVE):
All principal components are ____________ with one another.
A dataset has ________________ distinct principal components.

uncorrelated
min(𝑛 − 1, 𝑝)

New cards

Proportion of Variance Explained (PVE):
The first 𝑘 principal component scores and loadings approximate the original dataset, 𝑥_(i,j) ≈ _________________.

∑ 𝑧_(i,m) 𝜙_(j,m)

New cards

Principal Components Regression (PCR)

Apply dimension reduction property in a regression setting.

New cards

When k=p, PCR=_____________.

OLS

New cards

PCR uses ______________ instead of p variables as predictors.

principal components (pcs)

New cards

The number of PCs is a ______________ measure in PCR.

flexibility

New cards

PCR:
As the number of PCs increase, bias ___________, and variance _______________.

decreases
increases

New cards

Variable Selection

Some variables are omitted

New cards

Dimension Reduction

All variables are used to construct the PCs

New cards

PCR:
PCs are _______________ by definition, using PCs instead of the original predictors is one way to address __________________.

orthogonal
multicollinearity

New cards

PCR:
Optimal Number of PCs = ______________

Lowest Test MSE

New cards

Principal Components Regression:
𝑌 =
If 𝑘 = 𝑝, then 𝛽_ j =

= 𝜃_0 + 𝜃_(1)𝑧_(1) + ⋯ + 𝜃_(k)𝑧_(k) + 𝜀
= ∑ (m=1 to k) 𝜃_(m) 𝜙_(j,m)

New cards

Cluster Analysis:
C

Cluster containing indices

New cards

Cluster Analysis:
W(C)

Within-cluster variation of cluster

New cards

Cluster Analysis:
|C|

Number of observations in cluster

New cards

Cluster Analysis:
Euclidean Distance =

= sqrt[ ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2) ]

New cards

Clustering

Discovering subgroups within the data

New cards

k-Means Clustering

Partitions he observations in a dataset into pre-specified number (k) of clusters

New cards

𝑘-Means Clustering Steps:
1. ______________ assign a cluster to each observation. This serves as the initial cluster assignments.
2. Calculate the ______________ of each cluster.
3. For each observation, identify the ___________ centroid and reassign to that cluster.
4. Repeat steps 2 and 3 until the cluster assignments stop __________________.

Randomly
centroid
closest
changing

New cards

𝑘-Means Clustering:
𝑊(𝐶_u) =
=

= (1 / |𝐶_u|) ∑ (i, m in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2)
= 2 ∑ (i in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥̅_(u,j)}^(2)

New cards

Each iteration of the k-means clustering algorithm will reduce the ___________________, but only until the cluster assignments stop changing.

total within-cluster variation { 𝑊(𝐶_u) }

New cards

k-Means Clustering Drawbacks:
1. Final cluster assignments depend on ___________________
2. k's selection can be ________________

initial assignments
arbitrary

New cards

Hierarchical Clustering Steps:
1. Select the ____________ measure and ______________ to be used. Treat each observation as its own _______________.
2. For 𝑘 = 𝑛, 𝑛 − 1, ... , 2:
• Compute the _______________ dissimilarity between all 𝑘 clusters.
• Examine all ______________ pairwise dissimilarities. The two clusters with the ______________ inter-cluster dissimilarity are fused. The dissimilarity indicates the ________________ in the dendrogram at which these two clusters join.

dissimilarity
linkage
cluster
inter-cluster
(k choose 2)
lowest
height
once

New cards

Hierarchical Clustering:
The _________________ is subjective to the height the dendrogram is cut at.

number of clusters

New cards

Dissimilarity Measures

Euclidean Distance
Correlation-based distance

New cards

Dissimilarity Measures:
Euclidean Distance

Will be small for observations that are physically close together

New cards

Dissimilarity Measures:
Correlation-Based Distance

Will be small for observations with similar-shaped profiles

New cards

Complete and average linkages are favored because they create __________________ dendrograms.

balanced

New cards

A single linkage can produce a _____________ dendrogram.

skewed

New cards

A centroid linkage can cause _____________ in a dendrogram.

inversions

New cards

Both ____________ and _______________ are undesirable linkages.

single
centroid

New cards

Dendrogram:
Inversion

Joined at lower height than either of the individual clusters join.

New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/____________
Single/
Average/
Centroid/

The largest dissimilarity

New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/____________
Average/
Centroid/

The smallest dissimilarity

New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/______________
Centroid/

The arithmetic mean

New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/
Centroid/_____________

The dissimilarity between the cluster centroids

New cards

Hierarchical Clustering:
For 𝑘-means clustering, the algorithm needs to be repeated for each ___________.

𝑘

New cards

Hierarchical Clustering:
For hierarchical clustering, the algorithm only needs to be performed _____________ for any number of clusters.

once

New cards

Hierarchical Clustering:
The result of clustering depends on many parameters, such as:
• Choice of __________ in 𝑘-means clustering
• Choice of __________ of clusters, _____________, and _____________________ in hierarchical clustering
• Choice to _____________ variables

𝑘
number
linkage
dissimilarity measure
standardize

New cards

Consider these 3 items when we're using clustering methods:
1. _________________ the variables prior to clustering if the variables are not of the same scale.
2. _______________ may skew clustering results, so find a way to identify if the clusters are the true subgroups.
3. Clustering algorithms are not ________________. Clustering part of the dataset may produce wildly different results than clustering the entire dataset.

Standardize
Outliers
robust