SRM Exam Chapter 6: Unsupervised Learning

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/60

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:01 AM on 12/16/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

61 Terms

1
New cards

Principal Components Analysis (PCA):
z, Z

Principal component (score)

2
New cards

Principal Components Analysis (PCA):
Subscript m

Index for principal components

3
New cards

Principal Components Analysis (PCA):
𝜙

Principal component loading

4
New cards

Principal Components Analysis (PCA):
x, X

Centered explanatory variable

5
New cards

Unsupervised Learning

PCA
Clustering

6
New cards

PCA

Reduces the dimensions of a dataset

7
New cards

PCs are _______________ of all predictors in the dataset.

weighted averages

8
New cards

The first PC, z_1, explains the ______________ amount of variability in the dataset.

largest

9
New cards

The weights of PCs are called _____________, 𝜙.

loadings

10
New cards

The second PC, z_2, explains the ______________ amount of variability not explained by ________________ in the dataset.

largest
z_1

11
New cards

The first principal component is the direction along which the data ________________. Then the second principal component must be _________________ to the first.

varies the most
perpendicular

12
New cards

Biplot:
Horizontal axis are for the ____________ PCs.
Vertical axis are for the ______________ PCs.

first
second

13
New cards

Scree plot shows the __________________________ by each PC. The proportion should ________________ from one PC to the next since each PC should explain a smaller amount of variance than the previous PC. Use this plot to decide how many PCs explain a ______________________ in the data.

proportion of variance explained
decrease
sufficient amount of variability

14
New cards

Principal Components:
𝑧_m =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(j) }

15
New cards

Principal Components:
𝑧_(i,m) =

= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(i,j) }

16
New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m)^(2) } =

= 1

17
New cards

Principal Components:
∑ (j=1 to p) {𝜙_(j,m) • 𝜙_(j,u)} =
m _____ u

= 0

18
New cards

Proportion of Variance Explained (PVE):
∑ (j=1 to p) 𝑠_(x_ j)^(2) =

= ∑ (j=1 to p) {1 / (𝑛 − 1)} • ∑ {𝑥_(i,j)^(2)}

19
New cards

Proportion of Variance Explained (PVE):
𝑠_(z_m)^(2) =

= [1 / (𝑛 − 1)] ∑ 𝑧_(i,m)^(2)

20
New cards

Proportion of Variance Explained (PVE):
PVE =

= 𝑠_(z_m)^(2) / ∑ (j=1 to p) 𝑠_(x_ j)^(2)

21
New cards

Proportion of Variance Explained (PVE):
The variance explained by each subsequent principal component is always ______________ than the variance explained by the previous principal component.

less

22
New cards

Proportion of Variance Explained (PVE):
All principal components are ____________ with one another.
A dataset has ________________ distinct principal components.

uncorrelated
min(𝑛 − 1, 𝑝)

23
New cards

Proportion of Variance Explained (PVE):
The first 𝑘 principal component scores and loadings approximate the original dataset, 𝑥_(i,j) ≈ _________________.

∑ 𝑧_(i,m) 𝜙_(j,m)

24
New cards

Principal Components Regression (PCR)

Apply dimension reduction property in a regression setting.

25
New cards

When k=p, PCR=_____________.

OLS

26
New cards

PCR uses ______________ instead of p variables as predictors.

principal components (pcs)

27
New cards

The number of PCs is a ______________ measure in PCR.

flexibility

28
New cards

PCR:
As the number of PCs increase, bias ___________, and variance _______________.

decreases
increases

29
New cards

Variable Selection

Some variables are omitted

30
New cards

Dimension Reduction

All variables are used to construct the PCs

31
New cards

PCR:
PCs are _______________ by definition, using PCs instead of the original predictors is one way to address __________________.

orthogonal
multicollinearity

32
New cards

PCR:
Optimal Number of PCs = ______________

Lowest Test MSE

33
New cards

Principal Components Regression:
𝑌 =
If 𝑘 = 𝑝, then 𝛽_ j =

= 𝜃_0 + 𝜃_(1)𝑧_(1) + ⋯ + 𝜃_(k)𝑧_(k) + 𝜀
= ∑ (m=1 to k) 𝜃_(m) 𝜙_(j,m)

34
New cards

Cluster Analysis:
C

Cluster containing indices

35
New cards

Cluster Analysis:
W(C)

Within-cluster variation of cluster

36
New cards

Cluster Analysis:
|C|

Number of observations in cluster

37
New cards

Cluster Analysis:
Euclidean Distance =

= sqrt[ ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2) ]

38
New cards

Clustering

Discovering subgroups within the data

39
New cards

k-Means Clustering

Partitions he observations in a dataset into pre-specified number (k) of clusters

40
New cards

𝑘-Means Clustering Steps:
1. ______________ assign a cluster to each observation. This serves as the initial cluster assignments.
2. Calculate the ______________ of each cluster.
3. For each observation, identify the ___________ centroid and reassign to that cluster.
4. Repeat steps 2 and 3 until the cluster assignments stop __________________.

Randomly
centroid
closest
changing

41
New cards

𝑘-Means Clustering:
𝑊(𝐶_u) =
=

= (1 / |𝐶_u|) ∑ (i, m in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2)
= 2 ∑ (i in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥̅_(u,j)}^(2)

42
New cards

Each iteration of the k-means clustering algorithm will reduce the ___________________, but only until the cluster assignments stop changing.

total within-cluster variation { 𝑊(𝐶_u) }

43
New cards

k-Means Clustering Drawbacks:
1. Final cluster assignments depend on ___________________
2. k's selection can be ________________

initial assignments
arbitrary

44
New cards

Hierarchical Clustering Steps:
1. Select the ____________ measure and ______________ to be used. Treat each observation as its own _______________.
2. For 𝑘 = 𝑛, 𝑛 − 1, ... , 2:
• Compute the _______________ dissimilarity between all 𝑘 clusters.
• Examine all ______________ pairwise dissimilarities. The two clusters with the ______________ inter-cluster dissimilarity are fused. The dissimilarity indicates the ________________ in the dendrogram at which these two clusters join.

dissimilarity
linkage
cluster
inter-cluster
(k choose 2)
lowest
height
once

45
New cards

Hierarchical Clustering:
The _________________ is subjective to the height the dendrogram is cut at.

number of clusters

46
New cards

Dissimilarity Measures

Euclidean Distance
Correlation-based distance

47
New cards

Dissimilarity Measures:
Euclidean Distance

Will be small for observations that are physically close together

48
New cards

Dissimilarity Measures:
Correlation-Based Distance

Will be small for observations with similar-shaped profiles

49
New cards

Complete and average linkages are favored because they create __________________ dendrograms.

balanced

50
New cards

A single linkage can produce a _____________ dendrogram.

skewed

51
New cards

A centroid linkage can cause _____________ in a dendrogram.

inversions

52
New cards

Both ____________ and _______________ are undesirable linkages.

single
centroid

53
New cards

Dendrogram:
Inversion

Joined at lower height than either of the individual clusters join.

54
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/____________
Single/
Average/
Centroid/

The largest dissimilarity

55
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/____________
Average/
Centroid/

The smallest dissimilarity

56
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/______________
Centroid/

The arithmetic mean

57
New cards

Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/
Centroid/_____________

The dissimilarity between the cluster centroids

58
New cards

Hierarchical Clustering:
For 𝑘-means clustering, the algorithm needs to be repeated for each ___________.

𝑘

59
New cards

Hierarchical Clustering:
For hierarchical clustering, the algorithm only needs to be performed _____________ for any number of clusters.

once

60
New cards

Hierarchical Clustering:
The result of clustering depends on many parameters, such as:
• Choice of __________ in 𝑘-means clustering
• Choice of __________ of clusters, _____________, and _____________________ in hierarchical clustering
• Choice to _____________ variables

𝑘
number
linkage
dissimilarity measure
standardize

61
New cards

Consider these 3 items when we're using clustering methods:
1. _________________ the variables prior to clustering if the variables are not of the same scale.
2. _______________ may skew clustering results, so find a way to identify if the clusters are the true subgroups.
3. Clustering algorithms are not ________________. Clustering part of the dataset may produce wildly different results than clustering the entire dataset.

Standardize
Outliers
robust

Explore top notes

note
Science Elements
Updated 1050d ago
0.0(0)
note
Macroeconomics (copy)
Updated 754d ago
0.0(0)
note
Chapter 2: States
Updated 1042d ago
0.0(0)
note
Upper Extremity Notes
Updated 419d ago
0.0(0)
note
prima copilarie 3-6 ani
Updated 834d ago
0.0(0)
note
Chapter 8 and 13 Vocabulary
Updated 1246d ago
0.0(0)
note
Science Elements
Updated 1050d ago
0.0(0)
note
Macroeconomics (copy)
Updated 754d ago
0.0(0)
note
Chapter 2: States
Updated 1042d ago
0.0(0)
note
Upper Extremity Notes
Updated 419d ago
0.0(0)
note
prima copilarie 3-6 ani
Updated 834d ago
0.0(0)
note
Chapter 8 and 13 Vocabulary
Updated 1246d ago
0.0(0)

Explore top flashcards

flashcards
fifty common elements
50
Updated 928d ago
0.0(0)
flashcards
Stage 13 Vocab
23
Updated 1094d ago
0.0(0)
flashcards
DMU 3313 Kremkau
140
Updated 977d ago
0.0(0)
flashcards
ComputerArchitecture
74
Updated 1034d ago
0.0(0)
flashcards
Endo E2- Thyroid
85
Updated 362d ago
0.0(0)
flashcards
Applied Science Unit 3
151
Updated 495d ago
0.0(0)
flashcards
G8 U3
21
Updated 523d ago
0.0(0)
flashcards
fifty common elements
50
Updated 928d ago
0.0(0)
flashcards
Stage 13 Vocab
23
Updated 1094d ago
0.0(0)
flashcards
DMU 3313 Kremkau
140
Updated 977d ago
0.0(0)
flashcards
ComputerArchitecture
74
Updated 1034d ago
0.0(0)
flashcards
Endo E2- Thyroid
85
Updated 362d ago
0.0(0)
flashcards
Applied Science Unit 3
151
Updated 495d ago
0.0(0)
flashcards
G8 U3
21
Updated 523d ago
0.0(0)