Same State, Different Task: Continual Reinforcement Learning without Interference

Kessler, Samuel J.; Parker-Holder, Jack; Ball, Philip; Zohren, Stefan; Roberts, Stephen

doi:10.1609/aaai.v36i7.20674

Cited by 12 publications

(12 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Changepoint Detection: As discussed previously, one core issue in addressing settings with evolving task states is being able to detect change points or boundaries between significant switches without an oracle as in (Padakandla et al, 2019;Da Silva et al, 2006;Rosman and Ramamoorthy, 2012;Hadoux, Beynier, and Weng, 2014b;Li, Gu, Zhu, and Zhang, 2019;Kessler, Parker-Holder, Ball, Zohren, and Roberts, 2022;Luo, Jiang, Yu, Zhang, and Zhang, 2022). However, these approaches generally tend to be reactive to a changing distribution rather than proactive about anticipated changes in the future.…”

Section: Context Detectionmentioning

confidence: 99%

Towards Continual Reinforcement Learning: A Review and Perspectives

Khetarpal¹,

Riemer

Rish³

et al. 2022

jair

View full text Add to dashboard Cite

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.

show abstract

Section: Context Detectionmentioning

confidence: 99%

Towards Continual Reinforcement Learning: A Review and Perspectives

Khetarpal¹,

Riemer

Rish³

et al. 2022

jair

View full text Add to dashboard Cite

show abstract

“…We followed Kessler et al (2022) in evaluating the efficacy of a continual learning method in forgetting, measuring stability and backward transfer, and forward transfer, also acting as a measure for plasticity. These measurements were recorded for each environment in the experiment suite T = (τ 1 , τ 2 , .…”

Section: Continual Learning Assessmentmentioning

confidence: 99%

Transforming Pharma with Data Science, AI and Machine Learning

Yang¹

2022

Data Science, AI, and Machine Learning in Drug Development

View full text Add to dashboard Cite

Continual RL is a challenging problem where the agent is exposed to a sequence of tasks; it should learn new tasks without forgetting old ones, and learning the new task should improve performance on previous and future tasks. The most common approaches use model-free RL algorithms as a base, and replay buffers have been used to overcome catastrophic forgetting. However, the buffers are often very large making scalability difficult. Also, the concept of replay comes from biological inspiration, where evidence suggests that replay is applied to a world model, which implies model-based RL -and model-based RL should have benefits for continual RL, where it is possible to exploit knowledge independent of the policy. We present WMAR, World Models with Augmented Replay, a model-based RL algorithm with a world model and memory efficient distribution matching replay buffer. It is based on the well-known DreamerV3 algorithm, which has a simple FIFO buffer and was not tested in a continual RL setting. We evaluated WMAR vs WMAR (FIFO only) on tasks with and without shared structure from OpenAI ProcGen and Atari respectively, and without a task oracle. We found that WMAR has favourable properties on continual RL with significantly reduced computational overhead compared to WMAR (FIFO only). WMAR had small benefits over DreamerV3 on tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure; but at the cost of lower plasticity seen in a lower maximum on new tasks. The results suggest that model-based RL using a world model with a memory efficient replay buffer can be an effective and practical approach to continual RL, justifying future work.

show abstract

“…In the sequential task setting (Moskovitz et al, 2022a;Pacchiano et al, 2022), tasks (MDPs) are sampled one at a time from P M , with the agent training on each until convergence. In contrast to continual learning (Kessler et al, 2021), the agent's goal is simply to learn a new policy for each task more quickly as more are sampled, rather than learning a single policy which maintains its performance across tasks. Another important setting is meta-RL, which we do not consider here.…”

Section: Multiple Tasksmentioning

confidence: 99%

Minimum Description Length Control

Moskovitz¹,

Kao²,

Sahani³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks.

show abstract

Same State, Different Task: Continual Reinforcement Learning without Interference

Cited by 12 publications

References 31 publications

Towards Continual Reinforcement Learning: A Review and Perspectives

Towards Continual Reinforcement Learning: A Review and Perspectives

Transforming Pharma with Data Science, AI and Machine Learning

Minimum Description Length Control

Contact Info

Product

Resources

About