2018
DOI: 10.1609/aaai.v32i1.11595
|View full text |Cite
|
Sign up to set email alerts
|

Selective Experience Replay for Lifelong Learning

Abstract: Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing cove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
52
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 200 publications
(67 citation statements)
references
References 28 publications
0
52
0
Order By: Relevance
“…However, interference is ignored as multi-task performance of the all environments is similar to the sum of performances of individual environments. Different selective experience replay strategies can be used for preserving performance on past tasks (Isele and Cosgun 2018). Alternatively, (Mendez, Wang, and Eaton 2020) learns a policy gradient model which factorizes into task specific parameters and shared parameters.…”
Section: Related Workmentioning
confidence: 99%
“…However, interference is ignored as multi-task performance of the all environments is similar to the sum of performances of individual environments. Different selective experience replay strategies can be used for preserving performance on past tasks (Isele and Cosgun 2018). Alternatively, (Mendez, Wang, and Eaton 2020) learns a policy gradient model which factorizes into task specific parameters and shared parameters.…”
Section: Related Workmentioning
confidence: 99%
“…However, such methods may lead to a cumbersome and complex network if new tasks continually arrive. Replayed-based methods leverage episodic memory to store representative history data or generate virtual data via a generative model, and replay these samples with current data [2,7,8,12,26,37,39,40,45,50,51]. However, replayed-based methods may bring some problems since storing old data will result in data imbalance, and the generative model would be large and expensive if it synthesizes the historical data reasonably.…”
Section: Related Workmentioning
confidence: 99%
“…To relieve the dilemma, a growing body of continual learning methods are introduced. These method can be roughly divided into four categories: architecture-based methods expand the network or allocate new neurons for new tasks [13,28,42,60]; replayedbased methods interleave old data with current data by storing historical data in a buffer or generating virtual old data [4,8,12,39,40]; regularization-based methods penalize the update of important parameters of previous tasks [1,17,21,58]; algorithm-based methods modify the update rule of parameters to prevent the interference across tasks [7,26,43,47,49].…”
Section: Introductionmentioning
confidence: 99%
“…Studies around memory-based methods have focused on understanding different aspects of memory usage, such as: example selection for the mini-batch [30,31], size of the external memory [15], different methods of coding information into the memory [19], among others [32,33]. Other works have measured the impact of hyperparameters on certain methods [34], or studied the effect that rehearsal methods have on the loss functions [35].…”
Section: Memory-based Continual Learningmentioning
confidence: 99%
“…Other works have measured the impact of hyperparameters on certain methods [34], or studied the effect that rehearsal methods have on the loss functions [35]. A different line of work has focused on how to select elements from the memory, either by how much the loss of an element is affected [30] or by ranking based on the importance of preserving prior knowledge [32].…”
Section: Memory-based Continual Learningmentioning
confidence: 99%