Comprehensive Notes on Fair Principal Component Analysis (Fair PCA)
Overview of Classical Principal Component Analysis (PCA)
Standard PCA Formulation: Principal Component Analysis is formulated as a reconstruction error minimization problem: * Objective: * Constraint:
Interpretation and Process: * Data is projected onto a lower-dimensional subspace using the projection matrix . * The data is then reconstructed as . * The goal is to minimize the Frobenius norm () of the difference between original data and its reconstruction.
Equivalent Mathematical Interpretation: * The problem is equivalent to maximizing the variance captured in the subspace: . * The optimal directions correspond to the eigenvectors of the covariance matrix of . * PCA identifies the directions (eigenvectors) that represent the maximum variance in the dataset.
The Emergence of Fairness Issues in PCA
Standard Treatment vs. Reality: Classical PCA treats the entire dataset as a single, homogenous population. However, real-world data is often composed of distinct demographic groups.
Common Groups and Labels: * Gender: Male and Female. * Race: Group A and Group B.
Data Splitting Notation: The dataset is split based on group membership: * * : Represents the privileged group (Group A). * : Represents the unprivileged or harmed group (Group B).
Defining Group-wise Reconstruction Error
Average Reconstruction Error for Group A (): *
Average Reconstruction Error for Group B (): *
Definition of Unfairness: PCA is considered unfair if there is a significant discrepancy in how well groups are represented: * RE_{X_A}(U) < RE_{X_B}(U) * In this scenario, Group A is reconstructed better than Group B. Group B loses more information during the dimensionality reduction process, which constitutes the core fairness issue.
Disparity Measure and Fairness Goals
Disparity Calculation (): *
Interpreting Disparity Values: * D > 0: Group B is worse off than Group A (Unfair). * : Perfectly fair representation; both groups suffer equal information loss. * D < 0: Roles are reversed; Group A is worse off than Group B.
Fairness Goal: The ultimate objective is to achieve zero disparity () and equal reconstruction error (), ensuring a fair subspace projection where no group loses significantly more information than another.
Geometric Intuition of PCA Bias
Alignment Bias: PCA finds the direction maximizing overall variance across the combined dataset. If the distribution of Group A is highly spread out in a specific direction while Group B is spread in a different direction, the PCA direction will naturally align with the group contributing the most to the global variance (typically the privileged or larger group).
Resulting Disparity: Group A points remain "well spread" and maintain low error, while Group B points are "compressed" into the subspace, resulting in high reconstruction error.
Case Study: 2D Synthetic Dataset Example
Dataset Definition: * Group A (Horizontal Spread): Points are , , and . Resulting matrix . * Group B (Vertical Spread): Points are , , and . Resulting matrix .
Combined Dataset Analysis: * Variance in x-direction consists of values . * Variance in y-direction consists of values . * Because the variance is larger in the horizontal direction, PCA chooses the first principal component as .
Reconstruction and Error Calculation: * Projection: Projecting onto the x-axis means all y-components are discarded. * Group A Reconstruction: Since all points are already on the x-axis, the projection is lossless. . * Group B Reconstruction: Points collapse to the origin . The squared errors are . * Average Error for B: .
Disparity Result: * . This indicates strong unfairness as Group B suffers significantly whereas Group A suffers zero error.
Theorem 1: Fair PCA via Eigendecomposition
The Optimization Problem: For a fixed weighting factor , the Fair PCA objective is: * * Subject to .
The Main Result: The optimal projection is formed by the top eigenvectors of the fairness-adjusted matrix , defined as: *
Components of : * Standard PCA term (): Preserves overall variance across the full dataset. * Fairness correction term (): Favors directions that reduce the reconstruction gap between the groups.
Five-Step Derivation of Theorem 1
Step 1: Expand the Objective: Use the PCA reconstruction identity for any data matrix : * * Applying this yields errors for the full dataset (), Group A (), and Group B ().
Step 2: Substitute into : Substitute the expansions into the combined objective and distribute the signs: *
Step 3: Separate Variables: Collect all terms that do not involve into a single scalar constant : * * Objective becomes:
Step 4: Combine Trace Terms: By the linearity of the trace, group the -dependent variables together: * * Minimizing is equivalent to maximizing .
Step 5: Solve for Eigenvectors: This problem is now mathematically identical to classical PCA. Applying Lagrange multipliers: * Maximize subject to . * This leads to the eigenvalue problem: . * To find , we stack the top eigenvectors of .
Key Takeaways of Fair PCA
Computational Efficiency: It remains an eigenvalue problem, meaning no iterative solver is required. A single eigendecomposition of is sufficient.
Tunability: The parameter allows for a smooth interpolation between standard PCA (when ) and pure disparity minimization (when ).
Interpretability: The fairness-adjusted matrix has a clear dual structure of global variance preservation combined with a group disparity correction mechanism.