1/60
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Principal Components Analysis (PCA):
z, Z
Principal component (score)
Principal Components Analysis (PCA):
Subscript m
Index for principal components
Principal Components Analysis (PCA):
𝜙
Principal component loading
Principal Components Analysis (PCA):
x, X
Centered explanatory variable
Unsupervised Learning
PCA
Clustering
PCA
Reduces the dimensions of a dataset
PCs are _______________ of all predictors in the dataset.
weighted averages
The first PC, z_1, explains the ______________ amount of variability in the dataset.
largest
The weights of PCs are called _____________, 𝜙.
loadings
The second PC, z_2, explains the ______________ amount of variability not explained by ________________ in the dataset.
largest
z_1
The first principal component is the direction along which the data ________________. Then the second principal component must be _________________ to the first.
varies the most
perpendicular
Biplot:
Horizontal axis are for the ____________ PCs.
Vertical axis are for the ______________ PCs.
first
second
Scree plot shows the __________________________ by each PC. The proportion should ________________ from one PC to the next since each PC should explain a smaller amount of variance than the previous PC. Use this plot to decide how many PCs explain a ______________________ in the data.
proportion of variance explained
decrease
sufficient amount of variability
Principal Components:
𝑧_m =
= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(j) }
Principal Components:
𝑧_(i,m) =
= ∑ (j=1 to p) {𝜙_(j,m) • 𝑥_(i,j) }
Principal Components:
∑ (j=1 to p) {𝜙_(j,m)^(2) } =
= 1
Principal Components:
∑ (j=1 to p) {𝜙_(j,m) • 𝜙_(j,u)} =
m _____ u
= 0
≠
Proportion of Variance Explained (PVE):
∑ (j=1 to p) 𝑠_(x_ j)^(2) =
= ∑ (j=1 to p) {1 / (𝑛 − 1)} • ∑ {𝑥_(i,j)^(2)}
Proportion of Variance Explained (PVE):
𝑠_(z_m)^(2) =
= [1 / (𝑛 − 1)] ∑ 𝑧_(i,m)^(2)
Proportion of Variance Explained (PVE):
PVE =
= 𝑠_(z_m)^(2) / ∑ (j=1 to p) 𝑠_(x_ j)^(2)
Proportion of Variance Explained (PVE):
The variance explained by each subsequent principal component is always ______________ than the variance explained by the previous principal component.
less
Proportion of Variance Explained (PVE):
All principal components are ____________ with one another.
A dataset has ________________ distinct principal components.
uncorrelated
min(𝑛 − 1, 𝑝)
Proportion of Variance Explained (PVE):
The first 𝑘 principal component scores and loadings approximate the original dataset, 𝑥_(i,j) ≈ _________________.
∑ 𝑧_(i,m) 𝜙_(j,m)
Principal Components Regression (PCR)
Apply dimension reduction property in a regression setting.
When k=p, PCR=_____________.
OLS
PCR uses ______________ instead of p variables as predictors.
principal components (pcs)
The number of PCs is a ______________ measure in PCR.
flexibility
PCR:
As the number of PCs increase, bias ___________, and variance _______________.
decreases
increases
Variable Selection
Some variables are omitted
Dimension Reduction
All variables are used to construct the PCs
PCR:
PCs are _______________ by definition, using PCs instead of the original predictors is one way to address __________________.
orthogonal
multicollinearity
PCR:
Optimal Number of PCs = ______________
Lowest Test MSE
Principal Components Regression:
𝑌 =
If 𝑘 = 𝑝, then 𝛽_ j =
= 𝜃_0 + 𝜃_(1)𝑧_(1) + ⋯ + 𝜃_(k)𝑧_(k) + 𝜀
= ∑ (m=1 to k) 𝜃_(m) 𝜙_(j,m)
Cluster Analysis:
C
Cluster containing indices
Cluster Analysis:
W(C)
Within-cluster variation of cluster
Cluster Analysis:
|C|
Number of observations in cluster
Cluster Analysis:
Euclidean Distance =
= sqrt[ ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2) ]
Clustering
Discovering subgroups within the data
k-Means Clustering
Partitions he observations in a dataset into pre-specified number (k) of clusters
𝑘-Means Clustering Steps:
1. ______________ assign a cluster to each observation. This serves as the initial cluster assignments.
2. Calculate the ______________ of each cluster.
3. For each observation, identify the ___________ centroid and reassign to that cluster.
4. Repeat steps 2 and 3 until the cluster assignments stop __________________.
Randomly
centroid
closest
changing
𝑘-Means Clustering:
𝑊(𝐶_u) =
=
= (1 / |𝐶_u|) ∑ (i, m in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥_(m,j)}^(2)
= 2 ∑ (i in C_u) ∑ (j=1 to p) {𝑥_(i,j) − 𝑥̅_(u,j)}^(2)
Each iteration of the k-means clustering algorithm will reduce the ___________________, but only until the cluster assignments stop changing.
total within-cluster variation { 𝑊(𝐶_u) }
k-Means Clustering Drawbacks:
1. Final cluster assignments depend on ___________________
2. k's selection can be ________________
initial assignments
arbitrary
Hierarchical Clustering Steps:
1. Select the ____________ measure and ______________ to be used. Treat each observation as its own _______________.
2. For 𝑘 = 𝑛, 𝑛 − 1, ... , 2:
• Compute the _______________ dissimilarity between all 𝑘 clusters.
• Examine all ______________ pairwise dissimilarities. The two clusters with the ______________ inter-cluster dissimilarity are fused. The dissimilarity indicates the ________________ in the dendrogram at which these two clusters join.
dissimilarity
linkage
cluster
inter-cluster
(k choose 2)
lowest
height
once
Hierarchical Clustering:
The _________________ is subjective to the height the dendrogram is cut at.
number of clusters
Dissimilarity Measures
Euclidean Distance
Correlation-based distance
Dissimilarity Measures:
Euclidean Distance
Will be small for observations that are physically close together
Dissimilarity Measures:
Correlation-Based Distance
Will be small for observations with similar-shaped profiles
Complete and average linkages are favored because they create __________________ dendrograms.
balanced
A single linkage can produce a _____________ dendrogram.
skewed
A centroid linkage can cause _____________ in a dendrogram.
inversions
Both ____________ and _______________ are undesirable linkages.
single
centroid
Dendrogram:
Inversion
Joined at lower height than either of the individual clusters join.
Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/____________
Single/
Average/
Centroid/
The largest dissimilarity
Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/____________
Average/
Centroid/
The smallest dissimilarity
Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/______________
Centroid/
The arithmetic mean
Hierarchical Clustering:
Linkage/Inter-cluster dissimilarity
Complete/
Single/
Average/
Centroid/_____________
The dissimilarity between the cluster centroids
Hierarchical Clustering:
For 𝑘-means clustering, the algorithm needs to be repeated for each ___________.
𝑘
Hierarchical Clustering:
For hierarchical clustering, the algorithm only needs to be performed _____________ for any number of clusters.
once
Hierarchical Clustering:
The result of clustering depends on many parameters, such as:
• Choice of __________ in 𝑘-means clustering
• Choice of __________ of clusters, _____________, and _____________________ in hierarchical clustering
• Choice to _____________ variables
𝑘
number
linkage
dissimilarity measure
standardize
Consider these 3 items when we're using clustering methods:
1. _________________ the variables prior to clustering if the variables are not of the same scale.
2. _______________ may skew clustering results, so find a way to identify if the clusters are the true subgroups.
3. Clustering algorithms are not ________________. Clustering part of the dataset may produce wildly different results than clustering the entire dataset.
Standardize
Outliers
robust