Comprehensive Notes on Fair Principal Component Analysis (Fair PCA)

Overview of Classical Principal Component Analysis (PCA)

  • Standard PCA Formulation: Principal Component Analysis is formulated as a reconstruction error minimization problem:     * Objective: minUXXUUTF2\min_{U} |X - XUU^T|_F^2     * Constraint: UTU=IU^T U = I

  • Interpretation and Process:     * Data is projected onto a lower-dimensional subspace using the projection matrix UU.     * The data is then reconstructed as XUUTXUU^T.     * The goal is to minimize the Frobenius norm (FF) of the difference between original data and its reconstruction.

  • Equivalent Mathematical Interpretation:     * The problem is equivalent to maximizing the variance captured in the subspace: maxUtr(UTXTXU)\max_{U} tr(U^T X^TX U).     * The optimal directions correspond to the eigenvectors of the covariance matrix of XX.     * PCA identifies the directions (eigenvectors) that represent the maximum variance in the dataset.

The Emergence of Fairness Issues in PCA

  • Standard Treatment vs. Reality: Classical PCA treats the entire dataset as a single, homogenous population. However, real-world data is often composed of distinct demographic groups.

  • Common Groups and Labels:     * Gender: Male and Female.     * Race: Group A and Group B.

  • Data Splitting Notation: The dataset XX is split based on group membership:     * X=[XA,XB]X = [X_A, X_B]     * XAX_A: Represents the privileged group (Group A).     * XBX_B: Represents the unprivileged or harmed group (Group B).

Defining Group-wise Reconstruction Error

  • Average Reconstruction Error for Group A (REXA(U)RE_{X_A}(U)):     * 1nAXAXAUUTF2\frac{1}{n_A} |X_A - X_A UU^T|_F^2

  • Average Reconstruction Error for Group B (REXB(U)RE_{X_B}(U)):     * 1nBXBXBUUTF2\frac{1}{n_B} |X_B - X_B UU^T|_F^2

  • Definition of Unfairness: PCA is considered unfair if there is a significant discrepancy in how well groups are represented:     * RE_{X_A}(U) < RE_{X_B}(U)     * In this scenario, Group A is reconstructed better than Group B. Group B loses more information during the dimensionality reduction process, which constitutes the core fairness issue.

Disparity Measure and Fairness Goals

  • Disparity Calculation (DXB,XA(U)D_{X_B, X_A}(U)):     * D=1nBXBXBUUTF21nAXAXAUUTF2D = \frac{1}{n_B} |X_B - X_B UU^T|_F^2 - \frac{1}{n_A} |X_A - X_A UU^T|_F^2

  • Interpreting Disparity Values:     * D > 0: Group B is worse off than Group A (Unfair).     * D=0D = 0: Perfectly fair representation; both groups suffer equal information loss.     * D < 0: Roles are reversed; Group A is worse off than Group B.

  • Fairness Goal: The ultimate objective is to achieve zero disparity (D=0D = 0) and equal reconstruction error (REXAREXBRE_{X_A} \approx RE_{X_B}), ensuring a fair subspace projection where no group loses significantly more information than another.

Geometric Intuition of PCA Bias

  • Alignment Bias: PCA finds the direction maximizing overall variance across the combined dataset. If the distribution of Group A is highly spread out in a specific direction while Group B is spread in a different direction, the PCA direction will naturally align with the group contributing the most to the global variance (typically the privileged or larger group).

  • Resulting Disparity: Group A points remain "well spread" and maintain low error, while Group B points are "compressed" into the subspace, resulting in high reconstruction error.

Case Study: 2D Synthetic Dataset Example

  • Dataset Definition:     * Group A (Horizontal Spread): Points are (2,0)(2, 0), (3,0)(3, 0), and (4,0)(4, 0). Resulting matrix XA=(2amp;3amp;4 0amp;0amp;0)TX_A = \begin{pmatrix} 2 &amp; 3 &amp; 4 \ 0 &amp; 0 &amp; 0 \end{pmatrix}^T.     * Group B (Vertical Spread): Points are (0,1)(0, 1), (0,2)(0, 2), and (0,3)(0, 3). Resulting matrix XB=(0amp;0amp;0 1amp;2amp;3)TX_B = \begin{pmatrix} 0 &amp; 0 &amp; 0 \ 1 &amp; 2 &amp; 3 \end{pmatrix}^T.

  • Combined Dataset Analysis:     * Variance in x-direction consists of values 2,3,42, 3, 4.     * Variance in y-direction consists of values 1,2,31, 2, 3.     * Because the variance is larger in the horizontal direction, PCA chooses the first principal component as u1=[1,0]Tu_1 = [1, 0]^T.

  • Reconstruction and Error Calculation:     * Projection: Projecting onto the x-axis means all y-components are discarded.     * Group A Reconstruction: Since all points are already on the x-axis, the projection is lossless. REXA(U)=0RE_{X_A}(U) = 0.     * Group B Reconstruction: Points (0,1),(0,2),(0,3)(0, 1), (0, 2), (0, 3) collapse to the origin (0,0)(0, 0). The squared errors are (12+22+32)=1+4+9=14(1^2 + 2^2 + 3^2) = 1 + 4 + 9 = 14.     * Average Error for B: REXB(U)=1434.67RE_{X_B}(U) = \frac{14}{3} \approx 4.67.

  • Disparity Result:     * D=4.670=4.67D = 4.67 - 0 = 4.67. This indicates strong unfairness as Group B suffers significantly whereas Group A suffers zero error.

Theorem 1: Fair PCA via Eigendecomposition

  • The Optimization Problem: For a fixed weighting factor λ[0,1]\lambda \in [0, 1], the Fair PCA objective is:     * minUJ(U)=λREX(U)+(1λ)DXB,XA(U)\min_{U} J(U) = \lambda RE_X(U) + (1 - \lambda) D_{X_B, X_A}(U)     * Subject to UTU=IU^T U = I.

  • The Main Result: The optimal projection UU is formed by the top rr eigenvectors of the fairness-adjusted matrix C^\hat{C}, defined as:     * C^=λ1nXTX+(1λ)(1nBXBTXB1nAXATXA)\hat{C} = \lambda \frac{1}{n} X^T X + (1 - \lambda) \left( \frac{1}{n_B} X_B^T X_B - \frac{1}{n_A} X_A^T X_A \right)

  • Components of C^\hat{C}:     * Standard PCA term (λXTXn\lambda \cdot \frac{X^T X}{n}): Preserves overall variance across the full dataset.     * Fairness correction term ((1λ)(XBTXBnBXATXAnA)(1 - \lambda) ( \frac{X_B^T X_B}{n_B} - \frac{X_A^T X_A}{n_A} )): Favors directions that reduce the reconstruction gap between the groups.

Five-Step Derivation of Theorem 1

  • Step 1: Expand the Objective: Use the PCA reconstruction identity for any data matrix MM:     * MMUUTF2=tr(MMT)tr(UTMTMU)|M - MUU^T|_F^2 = tr(MM^T) - tr(U^T M^T M U)     * Applying this yields errors for the full dataset (REXRE_X), Group A (REARE_A), and Group B (REBRE_B).

  • Step 2: Substitute into J(U)J(U): Substitute the expansions into the combined objective and distribute the signs:     * J(U)=λ(1ntr(XXT)1ntr(UTXTXU))+(1λ)(REBREA)J(U) = \lambda ( \frac{1}{n} tr(XX^T) - \frac{1}{n} tr(U^T X^T X U) ) + (1 - \lambda) ( RE_B - RE_A )

  • Step 3: Separate Variables: Collect all terms that do not involve UU into a single scalar constant μ\mu:     * μ=λ1ntr(XXT)+(1λ)(1nBtr(XBXBT)1nAtr(XAXAT))\mu = \lambda \frac{1}{n} tr(XX^T) + (1 - \lambda) \left( \frac{1}{n_B} tr(X_B X_B^T) - \frac{1}{n_A} tr(X_A X_A^T) \right)     * Objective becomes: J(U)=μλ1ntr(UTXTXU)(1λ)(1nBtr(UTXBTXBU)1nAtr(UTXATXAU))J(U) = \mu - \lambda \frac{1}{n} tr(U^T X^T X U) - (1 - \lambda) \left( \frac{1}{n_B} tr(U^T X_B^T X_B U) - \frac{1}{n_A} tr(U^T X_A^T X_A U) \right)

  • Step 4: Combine Trace Terms: By the linearity of the trace, group the UU-dependent variables together:     * J(U)=μtr(UTC^U)J(U) = \mu - tr(U^T \hat{C} U)     * Minimizing J(U)J(U) is equivalent to maximizing tr(UTC^U)tr(U^T \hat{C} U).

  • Step 5: Solve for Eigenvectors: This problem is now mathematically identical to classical PCA. Applying Lagrange multipliers:     * Maximize u1TC^u1u_1^T \hat{C} u_1 subject to u1Tu1=1u_1^T u_1 = 1.     * This leads to the eigenvalue problem: C^u1=μ1u1\hat{C} u_1 = \mu_1 u_1.     * To find UU, we stack the top rr eigenvectors of C^\hat{C}.

Key Takeaways of Fair PCA

  • Computational Efficiency: It remains an eigenvalue problem, meaning no iterative solver is required. A single eigendecomposition of C^\hat{C} is sufficient.

  • Tunability: The parameter λ\lambda allows for a smooth interpolation between standard PCA (when λ=1\lambda = 1) and pure disparity minimization (when λ=0\lambda = 0).

  • Interpretability: The fairness-adjusted matrix C^\hat{C} has a clear dual structure of global variance preservation combined with a group disparity correction mechanism.