SRM Exam Chapter 6: Unsupervised Learning

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/60

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

61 Terms

1
New cards

Principal Components Analysis (PCA):
z, Z

Principal component (score)

2
New cards

Principal Components Analysis (PCA):
Subscript m

Index for principal components

3
New cards

Principal Components Analysis (PCA):
𝜙

Principal component loading

4
New cards

Principal Components Analysis (PCA):
x, X

Centered explanatory variable

5
New cards

Unsupervised Learning

PCA
Clustering

6
New cards

PCA

Reduces the dimensions of a dataset

7
New cards

PCs are _______________ of all predictors in the dataset.

weighted averages

8
New cards

The first PC, z_1, explains the ______________ amount of variability in the dataset.

largest

9
New cards

The weights of PCs are called _____________, 𝜙.

loadings

10
New cards

The second PC, z_2, explains the ______________ amount of variability not explained by ________________ in the dataset.

largest
z_1

11
New cards

The first principal component is the direction along which the data ________________. Then the second principal component must be _________________ to the first.

varies the most
perpendicular

12
New cards

Biplot:
Horizontal axis are for the ____________ PCs.
Vertical axis are for the ______________ PCs.

first
second

13
New cards

Scree plot shows the __________________________ by each PC. The proportion should ________________ from one PC to the next since each PC should explain a smaller amount of variance than the previous PC. Use this plot to decide how many PCs explain a ______________________ in the data.

proportion of variance explained
decrease
sufficient amount of variability

14
New cards

Principal Components:
𝑧_m =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(j) }

15
New cards

Principal Components:
𝑧_(i,m) =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(i,j) }

16
New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m)^(2) } =

= 1

17
New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m) • 𝜙_(j,u)} =
m _____ u

= 0

18
New cards

Proportion of Variance Explained (PVE):
∑ (j=1 to p) 𝑠_(x_ j)^(2) =

= ∑ (j=1 to p) {1 / (𝑛 − 1)} • ∑ {𝑥_(i,j)^(2)}

19
New cards

Proportion of Variance Explained (PVE):
𝑠_(z_m)^(2) =

= [1 / (𝑛 − 1)] ∑ 𝑧_(i,m)^(2)

20
New cards

Proportion of Variance Explained (PVE):
PVE =

= 𝑠_(z_m)^(2) / ∑ (j=1 to p) 𝑠_(x_ j)^(2)

21
New cards

Proportion of Variance Explained (PVE):
The variance explained by each subsequent principal component is always ______________ than the variance explained by the previous principal component.

less

22
New cards

Proportion of Variance Explained (PVE):
All principal components are ____________ with one another.
A dataset has ________________ distinct principal components.

uncorrelated
min(𝑛 − 1, 𝑝)

23
New cards

Proportion of Variance Explained (PVE):
The first 𝑘 principal component scores and loadings approximate the original dataset, 𝑥_(i,j) ≈ _________________.

∑ 𝑧_(i,m) 𝜙_(j,m)

24
New cards

Principal Components Regression (PCR)

Apply dimension reduction property in a regression setting.

25
New cards

When k=p, PCR=_____________.

OLS

26
New cards

PCR uses ______________ instead of p variables as predictors.

principal components (pcs)

27
New cards

The number of PCs is a ______________ measure in PCR.

flexibility

28
New cards

PCR:
As the number of PCs increase, bias ___________, and variance _______________.

decreases
increases

29
New cards

Variable Selection

Some variables are omitted

30
New cards

Dimension Reduction

All variables are used to construct the PCs

31
New cards

PCR:
PCs are _______________ by definition, using PCs instead of the original predictors is one way to address __________________.

orthogonal
multicollinearity

32
New cards

PCR:
Optimal Number of PCs = ______________

Lowest Test MSE

33
New cards

Principal Components Regression:
𝑌 =
If 𝑘 = 𝑝, then 𝛽_ j =

= 𝜃_0 + 𝜃_(1)𝑧_(1) + ⋯ + 𝜃_(k)𝑧_(k) + 𝜀
= ∑ (m=1 to k) 𝜃_(m) 𝜙_(j,m)

34
New cards

Cluster Analysis:
C

Cluster containing indices

35
New cards

Cluster Analysis:
W(C)

Within-cluster variation of cluster

36
New cards

Cluster Analysis:
|C|

Number of observations in cluster

37
New cards

Cluster Analysis:
Euclidean Distance =

= sqrt[ ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2) ]

38
New cards

Clustering

Discovering subgroups within the data

39
New cards

k-Means Clustering

Partitions he observations in a dataset into pre-specified number (k) of clusters

40
New cards

𝑘-Means Clustering Steps:
1. ______________ assign a cluster to each observation. This serves as the initial cluster assignments.
2. Calculate the ______________ of each cluster.
3. For each observation, identify the ___________ centroid and reassign to that cluster.
4. Repeat steps 2 and 3 until the cluster assignments stop __________________.

Randomly
centroid
closest
changing

41
New cards

𝑘-Means Clustering:
𝑊(𝐶_u) =
=

= (1 / |𝐶_u|) ∑ (i, m in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2)
= 2 ∑ (i in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥̅_(u,j)}^(2)

42
New cards

Each iteration of the k-means clustering algorithm will reduce the ___________________, but only until the cluster assignments stop changing.

total within-cluster variation { 𝑊(𝐶_u) }

43
New cards

k-Means Clustering Drawbacks:
1. Final cluster assignments depend on ___________________
2. k's selection can be ________________

initial assignments
arbitrary

44
New cards

Hierarchical Clustering Steps:
1. Select the ____________ measure and ______________ to be used. Treat each observation as its own _______________.
2. For 𝑘 = 𝑛, 𝑛 − 1, ... , 2:
• Compute the _______________ dissimilarity between all 𝑘 clusters.
• Examine all ______________ pairwise dissimilarities. The two clusters with the ______________ inter-cluster dissimilarity are fused. The dissimilarity indicates the ________________ in the dendrogram at which these two clusters join.

dissimilarity
linkage
cluster
inter-cluster
(k choose 2)
lowest
height
once

45
New cards

Hierarchical Clustering:
The _________________ is subjective to the height the dendrogram is cut at.

number of clusters

46
New cards

Dissimilarity Measures

Euclidean Distance
Correlation-based distance

47
New cards

Dissimilarity Measures:
Euclidean Distance

Will be small for observations that are physically close together

48
New cards

Dissimilarity Measures:
Correlation-Based Distance

Will be small for observations with similar-shaped profiles

49
New cards

Complete and average linkages are favored because they create __________________ dendrograms.

balanced

50
New cards

A single linkage can produce a _____________ dendrogram.

skewed

51
New cards

A centroid linkage can cause _____________ in a dendrogram.

inversions

52
New cards

Both ____________ and _______________ are undesirable linkages.

single
centroid

53
New cards

Dendrogram:
Inversion

Joined at lower height than either of the individual clusters join.

54
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/____________
Single/
Average/
Centroid/

The largest dissimilarity

55
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/____________
Average/
Centroid/

The smallest dissimilarity

56
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/______________
Centroid/

The arithmetic mean

57
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/
Centroid/_____________

The dissimilarity between the cluster centroids

58
New cards

Hierarchical Clustering:
For 𝑘-means clustering, the algorithm needs to be repeated for each ___________.

𝑘

59
New cards

Hierarchical Clustering:
For hierarchical clustering, the algorithm only needs to be performed _____________ for any number of clusters.

once

60
New cards

Hierarchical Clustering:
The result of clustering depends on many parameters, such as:
• Choice of __________ in 𝑘-means clustering
• Choice of __________ of clusters, _____________, and _____________________ in hierarchical clustering
• Choice to _____________ variables

𝑘
number
linkage
dissimilarity measure
standardize

61
New cards

Consider these 3 items when we're using clustering methods:
1. _________________ the variables prior to clustering if the variables are not of the same scale.
2. _______________ may skew clustering results, so find a way to identify if the clusters are the true subgroups.
3. Clustering algorithms are not ________________. Clustering part of the dataset may produce wildly different results than clustering the entire dataset.

Standardize
Outliers
robust