Transparency, Interpretability & Explainability in AI
Course & Module Context
- Overall course organized around the five responsible-AI criteria:
• F – Fairness (already completed pre-mid-sem)
• A – Accountability
• T – Transparency ← current focus
• P – Privacy
• R – Robustness - Module 7 launches the post-mid-sem portion, starting with Transparency before moving on to Accountability, Privacy, Robustness.
Three Key Terms & Their Nuances
- Transparency – Ability to “see through” a model; understand how it works internally.
- Interpretability – Human can identify the cause of an individual prediction; “Why was I accepted / rejected?”
- Explainability – Model (or an auxiliary tool) produces a human-understandable explanation of its behaviour.
• Terms often used interchangeably but differ subtly (thin boundary).
Why Interpretability/Explainability Matter
- Trust: end-users, auditors, regulators need confidence.
- Compliance with ethical guidelines & legal standards.
- Debugging & model validation.
- Feature-level insight for domain experts.
- Application-dependent: e.g. Fake-news flagger might not need rationale; loan approval system must supply reasons.
Intrinsically Interpretable Classical ML Models
Linear & Logistic Regression
• Core equation: y = \beta0 + \beta1x1 + \beta2x2 + \dots + \betanxn • Coefficients \betai directly show direction & magnitude of feature influence.
• Logistic wraps with sigmoid: \hat{p}=\sigma(z)=\frac{1}{1+e^{-z}} producing clear probability threshold.
• Example (house-price):
– Size weight 200 ⇒ every extra sqft adds 200\.
– Bedrooms weight 15{,}000 ⇒ 3→4 BHK increases price by 15{,}000\.
– Age weight negative ⇒ older house cheaper.Decision Tree
• Gives explicit if–then rules; each path traceable.
• Root → internal nodes → leaves (predictions).
• Inversion (root at top) provides human-readable flowchart.
• Random Forest / XGBoost: still tree-based but multiple trees & voting reduce interpretability (trade-off for performance).Naïve Bayes / Bayesian Models
• Outputs P(C\mid X); probabilities are themselves explanations.
• Formula: P(C|X)=\frac{P(X|C)P(C)}{P(X)}.
• Spam filtering example: words “free”, “urgent”, “meeting” each contribute likelihood; easy to list top tokens & their weights.## k-Nearest Neighbours (k-NN)
• Analogy-based: new instance classified by proximity in feature space.
• Visual & table explanations of nearest neighbours; majority vote rationale.
• Example: patient-heart-disease prediction using age/BP/cholesterol similarity.
Deep Learning & The Transparency Challenge
- Vanilla deep feed-forward nets, CNNs, RNNs often labelled black boxes.
- Whether interpretability is required depends on problem statement (e.g. anomaly detection may need root-cause tracing).
Intrinsically Interpretable DL Architectures
1. Attention & Transformers
- Transformer block = Encoder + Decoder, each built from:
• Multi-Head Attention (MHA)
• Feed-Forward Network (FFN)
• Add & Norm (skip / residual connections) - Skip connection: concatenate/ add original vector to processed output to combat vanishing gradients & preserve info.
- Attention intuition (human analogy):
• Distinguishing cat vs dog by focusing on ears, eyes, fur.
• Tiger vs cheetah: stripes vs spots. - Single-Head vs Multi-Head: one vs multiple feature sub-spaces examined in parallel.
- Self-attention (within same sequence), Cross-attention (encoder ↔ decoder).
- Q, K, V matrices learned; attention weights visualised as heat-maps (e.g., English→French word alignment).
- Result: weight matrices supply fine-grained, global explanations of feature importance.
2. Prototype Networks
- Human cognition uses representative exemplars (e.g., red Maruti 800 for “car”).
- Network learns prototype vectors for each class; new sample compared via distance metric (similar to twin/Siamese nets).
- High similarity ⇒ classification + interpretable “closest prototype” explanation.
3. Concept Bottleneck Models (CBM)
- Insert explicit concept layer C between input X and output Y.
- Learn two mappings:
• f1: X \rightarrow C (detect concepts). • f2: C \rightarrow Y (make decision). - Concepts annotated by humans (e.g., “curved beak”, “red breast” → robin).
- Bottleneck forces network to ground decisions in human concepts.
Post-Hoc Explainability Methods (Model-Agnostic)
- Apply after training any black-box model.
- LIME (Local Interpretable Model-agnostic Explanations)
• Perturb input around instance, fit sparse linear surrogate, return local feature weights. - SHAP (SHapley Additive exPlanations)
• Game-theoretic contribution scores; consistent & additive. - Grad-CAM (Gradient-weighted Class Activation Mapping)
• Uses gradients w.r.t. convolutional feature maps to produce heat-map overlay on image. - Counterfactuals
• “What minimal feature changes would flip the prediction?” Helpful for recourse.
Local vs Global, Model-Specific vs Model-Agnostic
- Intrinsic methods usually model-specific (tree rules, linear coefficients, attention weights).
- Post-hoc tools largely model-agnostic (LIME, SHAP).
- Local scope: explain a single prediction (LIME, SHAP, counterfactuals).
- Global scope: summarise model behaviour overall (feature importance, attention matrices, tree paths).
Connections to Earlier & Future Modules
- Builds on Fairness (pre-mid-sem); interpretability tools also aid bias detection.
- Accountability & auditability (up-coming) depend on transparent explanations.
- Privacy vs interpretability trade-offs (to be discussed later).
- Robustness: explanations reveal spurious correlations → defensive retraining.
Practical / Ethical Implications
- Regulatory compliance (GDPR “right to explanation”).
- Loan, hiring, medical diagnostics require cause-based answers.
- Choice of interpretable vs opaque model must align with stakeholder needs & risk tolerance.
Key Equations & Numerical References
- Linear/Logistic: y = \beta0 + \sum{i=1}^{n} \betai xi ; \sigma(z)=\frac{1}{1+e^{-z}}.
- Bayes Rule: P(C|X)=\frac{P(X|C)P(C)}{P(X)}.
- k-NN vote proportion: \hat{y}=\text{mode}\bigl( y{(1)},\dots,y{(k)} \bigr).
- Attention score (scaled dot-product): \text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V.
- Transformer patch example: image split into 16\times16 pixel patches → sequence.
Study Tips & Further Reading
- Practice tracing a decision-tree path for sample inputs.
- Manually compute SHAP values for a 3-feature toy model to internalise concept.
- Visualise attention maps (e.g., via HuggingFace tools) to connect theory to practice.
- Explore prototype networks with small image datasets (CIFAR-10) for intuition.