REINFORCEMENT UNLEARNING

Abstract

Machine unlearning refers to the process of mitigating the influence of specific training data on machine learning models based on removal requests from data owners. A significant area that has been overlooked in the research of unlearning involves reinforcement learning (RL). In RL, an agent learns to make optimal decisions within an environment, maximizing its cumulative rewards. During training, agents can memorize environmental features, which raises privacy concerns. Data protection regulations empower data owners to revoke access to the agent's training data, leading to the development of reinforcement unlearning. This unique field focuses on revoking entire environments rather than individual data samples, posing three notable challenges:

How to propose unlearning schemes for environments.
How to prevent degradation of the agent’s performance in remaining environments.
How to evaluate the effectiveness of unlearning. To address these issues, two reinforcement unlearning methods are proposed: decremental reinforcement learning for gradually erasing an agent's knowledge and environment poisoning attacks that encourage the agent to learn inaccurate knowledge to remove the unlearning environment. The concept of "environment inference" is introduced for evaluating the outcomes of unlearning.

1. Introduction

Machine learning is contingent on the acquisition of extensive datasets. To protect the privacy of individual users, data protection regulations like the General Data Protection Regulation (GDPR) empower users to request data removal. It is crucial for model owners to comply with these requests, necessitating the removal of revoked data and the elimination of their influence on models, leading to the concept of machine unlearning.

Despite advancements in machine unlearning, reinforcement learning remains underexplored in this context. In reinforcement learning, the main objective is to train agents that can interact with environments through policies guiding actions based on states. Agents learn from every action through rewards, forming experience samples to update their policies. However, memorizing features can lead to security concerns, especially when sensitive information is encountered. Reinforcement unlearning thus becomes a priority when users request the deletion of their data.

Unlike conventional machine unlearning that focuses on removing specific dataset samples, reinforcement learning involves dynamic decision-making where agents interact continuously with environments, making reinforcement unlearning more complex.

Distinction from Privacy-Preserving RL

While related to privacy concerns, reinforcement unlearning differs from privacy-preserving RL. The former selectively erases learned knowledge to protect the environment owner's privacy, while the latter seeks to protect personal data of the agent.

Key Challenges

Unlearning an Environment from Agent Policy: Unlike machine unlearning, an environment owner can't access experience samples managed by the agent.
Preventing Performance Degradation: Unlearning can directly impact an agent's performance, which may be more significant in reinforcement learning than conventional learning methods.
Effectiveness Evaluation of Unlearning: Membership inference cannot evaluate reinforcement unlearning as it is incompatible with how experience samples are managed.

II. Preliminaries

Notations

M: Learning environment/task
S: State set, S = {s1, ..., sn}
A: Action set, A = {a1, ..., am}
T: Transition function
r: Reward function
γ: Discount factor
π: Policy learned by the agent
Q(s, a): Value of Q-function for action a in state s
e: Experience sample
B: Batch of experience samples for agent learning
τ: A trajectory consisting of a series of state-action pairs
| |x| |∞: Maximum dimension in vector x

Reinforcement Learning Overview

In RL, each environment is defined by the tuple M = ⟨S, A, T, r⟩. An agent selects actions from the action set based on its policy. Actions cause transitions that affect state change and awards to the agent. Experience samples are collected to update policies aimed at cumulative reward maximization.

Machine Unlearning

Machine unlearning aims to erase the effect of specific data on model behavior. A basic method involves removing revoked data and retraining from scratch, aiming for model performance aligned with an untrained model.

III. Reinforcement Unlearning

A. Problem Statement

The concept of forgetting in reinforcement unlearning varies by application context. In privacy applications, this entails ensuring models have no exposure to the unlearned data. In contrast, for bias-removal applications, the unlearned model should not predict specific labels previously assigned to the forgotten data.

Reinforcement unlearning’s definition implies a model's performance should decline in an unlearned environment. The process operates within a specific learning paradigm distinct from conventional methods.

B. Threat Model

Reinforcement unlearning primarily counters the risks tied to inference attacks, where adversaries attempt to discern a learning environment by observing agent actions.

C. Methods Overview

Both decremental reinforcement learning and the poisoning-based methods intentionally degrade performance in unlearning environments while preserving capabilities elsewhere. The decremental method minimizes rewards specifically within the targeted environment, while the poisoning approach alters the environment to mislead the agent.

D. Decremental RL-based Method

The method consists of two steps:

Exploration of Unlearning Environment: The agent explores and accumulates experience samples in the unlearning environment.
Fine-Tuning Using Accumulated Samples: The policy is refined via a custom loss function, targeting unlearning while maintaining performance elsewhere.

E. Environment Poisoning-based Method

This method modifies the unlearning environment's transition dynamics while permitting the agent to learn a new policy in response. Hence, it strategically employs poisoning actions to alter learned agent behavior such that effectiveness is minimized in the unlearning environment.

F. Challenges and Comparisons

Both methods face challenges like over-unlearning and continuation of knowledge retention across environments. Comparisons show that the poisoning method provides stability and predictability over the decremental method, particularly in complex environments.

IV. Experimental Evaluation

A. Evaluation Setup

New metrics specific to reinforcement unlearning are defined, including:

Cumulative Reward: Total rewards accrued by an agent using acquired policies.
Number of Steps: Total actions taken by the agent to achieve goals.
Environment Similarity: Resemblance between the inferred environment and the original.

Experiments encompass multiple tasks, including grid world and maze explorer applications. Notably, environments were generated with careful obstacles and transitions to simulate user interactive cues.

B. Performance and Robustness Observations

Post-experimental results indicate that even after unlearning, agents maintain efficacy in retained environments and exhibit a commendable level of unlearning when assessed through cumulative reward and number of steps.

V. Privacy and Safety Studies

A. Recommendation Systems and Safety Risks

In privacy studies, the effectiveness is indicated by drops in recommendation accuracy following data deletion. In safety-critical tests, reinforcement unlearning shows pronounced risks of increased collisions while performing tasks post-unlearning, suggesting a critical need to balance safety and privacy in such applications.

VI. Conclusion

This paper pioneers the field of reinforcement unlearning, allowing for agents to forget environments adhering to strict privacy requirements. Two effective methods are developed alongside novel evaluation methodology, laying a foundation for future advancements in this landscape.