Privacy-Preserving Machine Learning- Methods, Challenges and Directions

Background: Machine Learning (ML) is widely used in various applications, necessitating vast amounts of data and computation which raises privacy concerns.
Privacy Concerns: Large data volumes can lead to leaks of sensitive information, heightened by regulatory pressures such as GDPR.
Threats to ML: Adversarial attacks like membership inference can compromise ML models, necessitating Privacy-Preserving Machine Learning (PPML) solutions.
Research Directions: Significant research in both academia and industry is focusing on integrating privacy-preserving techniques in the ML pipeline, leading to the development of various architectures and methods.
PGU Model: The paper proposes a Phase, Guarantee, and Utility (PGU) model to evaluate PPML solutions systematically.

Emerging Techniques: Deep learning has notably improved the accuracy of ML models but also presents new challenges including resource constraints and privacy issues.
Federated Learning (FL): A method allowing ML models to be trained on decentralized data across multiple devices, showing promise in sensitive domains like healthcare.
Challenges: Performance relies on huge data volumes which lead to serious privacy concerns. Existing data safeguards often reduce model utility.

Phases of ML Operation:
- Data Preparation: Cleaning and structuring data for use.
- Model Training: Using algorithms to learn from data.
- Model Evaluation: Testing the model for accuracy and performance.
- Model Deployment: Making the model available for use in applications.
- Model Inference: Applying the model to new data to make predictions.

Phases Identified:
1. Privacy-Preserving Model Generation:
  - Data Preparation (anonymization, surrogate datasets).
  - Model Training (using differential privacy).
2. Privacy-Preserving Model Serving:
  - Deployment and making inferences while ensuring data security.

Types of Guarantees:
- Object-Oriented: Focuses on protecting specific objects like model weights and training data.
- Pipeline-Oriented: Evaluates the privacy assurances across the ML process.

Technical Utility Types:
- Data Publishing Approaches: Eliminating identifiers from data, perturbation based techniques like differential privacy.
- Data Processing Approaches: Secure computation during training using methods like garbled circuits and homomorphic encryption.
- Architectural Approaches: Building privacy-preserving architectures (e.g., federated learning).
- Hybrid Approaches: Combining different techniques to balance privacy and efficacy.

Evaluation of Privacy: Lack of uniform standards for measuring privacy across different settings.
Utility Costs: Balancing privacy guarantees with model accuracy and communication overheads.
Future Research Directions:
- Defining clear frameworks for evaluating privacy in ML systems.
- Exploring effective techniques to enhance both privacy and model performance without sacrificing either.