Prioritized Experience Replay for Continual Learning

Hu, Guannan; Wu, Zhifang; Zhu, Wenhao

doi:10.1109/iccia52886.2021.00011

Cited by 33 publications

(61 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is worth mentioning that many other algorithms are also introducing a variety of experience replay schemes. Some of them [43,55] depend on new components, and others [47] have different algorithm architectures. Since the backbone MARL algorithm of our choice in this experiment is QMIX, we do not expect a significant change over the algorithm architecture (e.g., actor-network) or major components (e.g., loss structure) as presented in other approaches to realize a relatively fair comparison.…”

Section: Comparison With Existing Experience Replay Methodsmentioning

confidence: 99%

“…[33] uses the regret minimization method to design the prioritized experience replay scheme for the only agent in the environment. MaPER [43] employs model learning to improve experience replay by using a model-augmented critic network and modifying the rule of priority. Also, new loss function designs can help develop prioritization schemes [55].…”

Section: Single-agent Experience Replaymentioning

confidence: 99%

See 1 more Smart Citation

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Mei¹,

Zhou²,

Lan³

et al. 2023

Preprint

View full text Add to dashboard Cite

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decisionmaking problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization is relaxed and solved using the Lagrangian multiplier approach to obtain the close-form optimal sampling weights. By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy, thus acquiring an improved prioritization scheme for multi-agent tasks. Our experimental results on Predator-Prey and StarCraft Multi-Agent Challenge environments demonstrate the effectiveness of our method, having a better ability to replay important transitions and outperforming other state-of-the-art baselines.

show abstract

Section: Comparison With Existing Experience Replay Methodsmentioning

confidence: 99%

Section: Single-agent Experience Replaymentioning

confidence: 99%

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Mei¹,

Zhou²,

Lan³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…However, as the number of tasks increases, the fraction of memory allocated to each task shrinks, resulting in fewer samples per task for rehearsal. Other more sophisticated strategies focus on prioritising replay [34], storing and replaying exemplars from each task to best approximate task means [35], [36] or applying reservoir sampling to fix a budget for each seen task [37].…”

Section: A Rehearsalmentioning

confidence: 99%

Latent Generative Replay for Resource-Efficient Continual Learning of Facial Expressions

Stoychev

Churamani

Güneş

2023

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)

View full text Add to dashboard Cite

Real-world Facial Expression Recognition (FER) systems require models to constantly learn and adapt with novel data. Traditional Machine Learning (ML) approaches struggle to adapt to such dynamics as models need to be re-trained from scratch with a combination of both old and new data. Replay-based Continual Learning (CL) provides a solution to this problem, either by storing previously seen data samples in memory, sampling and interleaving them with novel data (rehearsal) or by using a generative model to simulate pseudosamples to replay past knowledge (pseudo-rehearsal). Yet, the high memory footprint of rehearsal and the high computational cost of pseudo-rehearsal limit the real-world application of such methods, especially on resource-constrained devices. To address this, we propose Latent Generative Replay (LGR) for pseudo-rehearsal of low-dimensional latent features to mitigate forgetting in a resource-efficient manner. We adapt popular CL strategies to use LGR instead of generating pseudo-samples, resulting in performance upgrades when evaluated on the CK+, RAF-DB and AffectNet FER benchmarks where LGR significantly reduces the memory and resource consumption of replay-based CL without compromising model performance.

show abstract

“…Their simplicity and ease of implementation distinguish these methods, so they are, in general, the most popular approach. RLED value-based methods extend the most popular methods, such as Q-Learning [97], SARSA [72], Deep Q-Networks (DQN) [61], Double DQN (DDQN) [90], Prioritized Dueling Double Deep Q-Networks (PDD DQN) [75], and Dueling Network Architectures for Deep Reinforcement Learning [94]. -Policy-based methods directly estimate the control policy.…”

Section: Reinforcement Learning From Expert Demonstrationsmentioning

confidence: 99%

Model-free reinforcement learning from expert demonstrations: a survey

Ramírez

Perrusquía

2021

Artif Intell Rev

View full text Add to dashboard Cite

Reinforcement learning from expert demonstrations (RLED) is the intersection of imitation learning with reinforcement learning that seeks to take advantage of these two learning approaches. RLED uses demonstration trajectories to improve sample efficiency in high-dimensional spaces. RLED is a new promising approach to behavioral learning through demonstrations from an expert teacher. RLED considers two possible knowledge sources to guide the reinforcement learning process: prior knowledge and online knowledge.This survey focuses on novel methods for model-free reinforcement learning guided through demonstrations, commonly but not necessarily provided by humans. The methods are analyzed and classified according to the impact of the demonstrations. Challenges, applications, and promising approaches to improve the discussed methods are also discussed.

show abstract

Prioritized Experience Replay for Continual Learning

Cited by 33 publications

References 15 publications

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Latent Generative Replay for Resource-Efficient Continual Learning of Facial Expressions

Model-free reinforcement learning from expert demonstrations: a survey

Contact Info

Product

Resources

About