Distributed Prioritized Experience Replay

Horgan, Dan; Quan, John; Budden, David; Barth-Maron, Gabriel; Hessel, Matteo; Hasselt, Hado van; Silver, David

doi:10.48550/arxiv.1803.00933

Cited by 119 publications

(138 citation statements)

References 19 publications

(18 reference statements)

Supporting

Mentioning

137

Contrasting

Order By: Relevance

“…Other variants of the value-based algorithms are developed to enhance the performance of vanilla DQN algorithm in terms of stability, convergence speed, implementation complexity, sample/learning efficiency, etc. Such variants include prioritized experience replay DQN [75], distributed prioritized experience replay DQN [76], distributional DQN [77], Rainbow DQN [78], and recurrent DQN [79].…”

Section: Other Drl Algorithmsmentioning

confidence: 99%

Deep Reinforcement Learning for Radio Resource Allocation and Management in Next Generation Heterogeneous Wireless Networks: A Survey

Alwarafy,

Abdallah,

Ciftler

et al. 2021

Preprint

View full text Add to dashboard Cite

Section: Other Drl Algorithmsmentioning

confidence: 99%

Deep Reinforcement Learning for Radio Resource Allocation and Management in Next Generation Heterogeneous Wireless Networks: A Survey

Alwarafy,

Abdallah,

Ciftler

et al. 2021

Preprint

View full text Add to dashboard Cite

“…For this motivation, we have also implemented dqn.py, dqn atari.py (Mnih et al, 2013), c51.py, c51 atari.py (Bellemare et al, 2017), apex atari.py (Horgan et al, 2018), ddpg continuous action.py (Lillicrap et al, 2015), td3 continuous action.py (Fujimoto et al, 2018), and sac continuous action.py (Haarnoja et al, 2018).…”

Section: Single-file Implementationsmentioning

confidence: 99%

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Huang¹,

Dossa²,

Ye³

et al. 2021

Preprint

View full text Add to dashboard Cite

CleanRL is an open-source library that provides high-quality single-file implementations of Deep Reinforcement Learning algorithms. It provides a simpler yet scalable developing experience by having a straightforward codebase and integrating production tools to help interact and scale experiments. In CleanRL, we put all details of an algorithm into a single file, making these performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, videos of an agent's gameplay, dependencies, and more to the cloud. Despite succinct implementations, we have also designed tools to help scale, at one point orchestrating experiments on more than 2000 machines simultaneously via Docker and cloud providers. Finally, we have ensured the quality of the implementations by benchmarking against a variety of environments.

show abstract

“…There are some similar properties between prediction error and TD error: 1) they both converge when the policy converges; 2) they are common metrics that show promising results for exploration [12,11,42,16] and exploitation [43][44][45], respectively. These similarities motivate us to study the effects of EMC's each module by comparing with the ablation study using TD error.…”

Section: Similaritiesmentioning

confidence: 99%

Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

Zheng¹,

Chen²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local actionobservation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.

show abstract

Distributed Prioritized Experience Replay

Cited by 119 publications

References 19 publications

Deep Reinforcement Learning for Radio Resource Allocation and Management in Next Generation Heterogeneous Wireless Networks: A Survey

Deep Reinforcement Learning for Radio Resource Allocation and Management in Next Generation Heterogeneous Wireless Networks: A Survey

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

Contact Info

Product

Resources

About