Selective Experience Replay for Lifelong Learning

Isele, David; Cosgun, Akansel

doi:10.1609/aaai.v32i1.11595

Cited by 200 publications

(67 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, interference is ignored as multi-task performance of the all environments is similar to the sum of performances of individual environments. Different selective experience replay strategies can be used for preserving performance on past tasks (Isele and Cosgun 2018). Alternatively, (Mendez, Wang, and Eaton 2020) learns a policy gradient model which factorizes into task specific parameters and shared parameters.…”

Section: Related Workmentioning

confidence: 99%

Same State, Different Task: Continual Reinforcement Learning without Interference

Kessler

Parker-Holder

Ball

et al. 2022

AAAI

View full text Add to dashboard Cite

Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this "interference" as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads, each specializing on a new task. The separate heads in OWL are used to prevent interference. At test time, we formulate policy selection as a multi-armed bandit problem, and show it is possible to select the best policy for an unknown task using feedback from the environment. The use of bandit algorithms allows the OWL agent to constructively re-use different continually learnt policies at different times during an episode. We show in multiple RL environments that existing replay based CL methods fail, while OWL is able to achieve close to optimal performance when training sequentially.

show abstract

Section: Related Workmentioning

confidence: 99%

Same State, Different Task: Continual Reinforcement Learning without Interference

Kessler

Parker-Holder

Ball

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…However, such methods may lead to a cumbersome and complex network if new tasks continually arrive. Replayed-based methods leverage episodic memory to store representative history data or generate virtual data via a generative model, and replay these samples with current data [2,7,8,12,26,37,39,40,45,50,51]. However, replayed-based methods may bring some problems since storing old data will result in data imbalance, and the generative model would be large and expensive if it synthesizes the historical data reasonably.…”

Section: Related Workmentioning

confidence: 99%

“…To relieve the dilemma, a growing body of continual learning methods are introduced. These method can be roughly divided into four categories: architecture-based methods expand the network or allocate new neurons for new tasks [13,28,42,60]; replayedbased methods interleave old data with current data by storing historical data in a buffer or generating virtual old data [4,8,12,39,40]; regularization-based methods penalize the update of important parameters of previous tasks [1,17,21,58]; algorithm-based methods modify the update rule of parameters to prevent the interference across tasks [7,26,43,47,49].…”

Section: Introductionmentioning

confidence: 99%

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Kong¹,

Liu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Continual learning is a learning paradigm that learns tasks sequentially with resources constraints, in which the key challenge is stability-plasticity dilemma, i.e., it is uneasy to simultaneously have the stability to prevent catastrophic forgetting of old tasks and the plasticity to learn new tasks well. In this paper, we propose a new continual learning approach, Advanced Null Space (AdNS), to balance the stability and plasticity without storing any old data of previous tasks. Specifically, to obtain better stability, AdNS makes use of low-rank approximation to obtain a novel null space and projects the gradient onto the null space to prevent the interference on the past tasks. To control the generation of the null space, we introduce a non-uniform constraint strength to further reduce forgetting. Furthermore, we present a simple but effective method, intra-task distillation, to improve the performance of the current task. Finally, we theoretically find that null space plays a key role in plasticity and stability, respectively. Experimental results show that the proposed method can achieve better performance compared to state-ofthe-art continual learning approaches.

show abstract

“…Studies around memory-based methods have focused on understanding different aspects of memory usage, such as: example selection for the mini-batch [30,31], size of the external memory [15], different methods of coding information into the memory [19], among others [32,33]. Other works have measured the impact of hyperparameters on certain methods [34], or studied the effect that rehearsal methods have on the loss functions [35].…”

Section: Memory-based Continual Learningmentioning

confidence: 99%

“…Other works have measured the impact of hyperparameters on certain methods [34], or studied the effect that rehearsal methods have on the loss functions [35]. A different line of work has focused on how to select elements from the memory, either by how much the loss of an element is affected [30] or by ranking based on the importance of preserving prior knowledge [32].…”

Section: Memory-based Continual Learningmentioning

confidence: 99%

Populating Memory in Continual Learning with Consistency Aware Sampling

Hurtado¹,

Raymond-Saez²,

Araujo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Continual Learning methods strive to mitigate Catastrophic Forgetting (CF), where knowledge from previously learned tasks is lost when learning a new one. Among those algorithms, some maintain a subset of samples from previous tasks when training. These samples are referred to as a memory. These methods have shown outstanding performance while being conceptually simple and easy to implement. Yet, despite their popularity, little has been done to understand which elements to be included into the memory. Currently, this memory is often filled via random sampling with no guiding principles that may aid in retaining previous knowledge. In this work, we propose a criterion based on the learning consistency of a sample called Consistency AWare Sampling (CAWS). This criterion prioritizes samples that are easier to learn by deep networks. We perform studies on three different memory-based methods: AGEM, GDumb, and Experience Replay, on MNIST, CIFAR-10 and CIFAR-100 datasets. We show that using the most consistent elements yields performance gains when constrained by a compute budget; when under no such constrain, random sampling is a strong baseline. However, using CAWS on Experience Replay yields improved performance over the random baseline. Finally, we show that CAWS achieves similar results to a popular memory selection method while requiring significantly less computational resources.Preprint. Under review.

show abstract

Selective Experience Replay for Lifelong Learning

Cited by 200 publications

References 28 publications

Same State, Different Task: Continual Reinforcement Learning without Interference

Same State, Different Task: Continual Reinforcement Learning without Interference

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Populating Memory in Continual Learning with Consistency Aware Sampling

Contact Info

Product

Resources

About