1/14
Vocabulary flashcards covering key PCA concepts from the lecture notes, including dimensionality reduction, PCA components, eigenvalues/eigenvectors, standardization, covariance, scores, and practical examples.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Principal Component Analysis (PCA)
A dimensionality reduction technique that transforms a set of possibly correlated variables into a smaller set of uncorrelated variables called principal components, capturing most of the data's variance.
Dimensionality reduction
The process of reducing the number of random variables under consideration, typically by obtaining a smaller set of principal variables that retain most of the information.
Unsupervised data mining technique
A data mining method that does not use labeled outcomes; PCA is unsupervised and focuses on capturing structure/variance in the data.
Principal Component (PC)
A linear combination of original variables that explains a portion of the total variance; PCs are ordered by explained variance (PC1, PC2, …), and are uncorrelated.
Eigenvalue
A scalar indicating how much variance is captured by its corresponding eigenvector in PCA; used to rank principal components.
Eigenvector
A weight vector that defines the direction of maximum variance for a principal component; columns form the eigenvectors matrix.
Loadings
The contributions of the original variables to a principal component; elements of an eigenvector showing how much each variable contributes.
Variance explained (explainedvarianceratio_)
The proportion of total variance explained by a given principal component (e.g., PC1 explains 46.62%).
Cumulative variance
The running total of explained variance across principal components; indicates how many components are needed to reach a desired information threshold.
Uncorrelated (orthogonal) PCs
Principal components are constructed to be uncorrelated with each other, meaning their pairwise covariances are zero.
Standardization (Z-score) before PCA
Scaling variables to zero mean and unit variance because PCA is sensitive to the scale of variables.
Covariance (co-variation) matrix
Matrix of covariances between pairs of variables; its eigenvalues/eigenvectors are used to compute principal components.
PC scores
The coordinates of observations in the PC space; computed as a weighted sum of standardized variables using PC weights.
World Bank health data PCA example
An illustrative application where PCA is applied to health indicators across countries to reduce variables and identify top principal components.
How to decide number of PCs to keep
Use explained variance and cumulative variance to choose how many PCs explain a desired portion of information (e.g., 80–95%).