2018
DOI: 10.48550/arxiv.1803.00933
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributed Prioritized Experience Replay

Abstract: We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
137
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 119 publications
(138 citation statements)
references
References 19 publications
(18 reference statements)
1
137
0
Order By: Relevance
“…Other variants of the value-based algorithms are developed to enhance the performance of vanilla DQN algorithm in terms of stability, convergence speed, implementation complexity, sample/learning efficiency, etc. Such variants include prioritized experience replay DQN [75], distributed prioritized experience replay DQN [76], distributional DQN [77], Rainbow DQN [78], and recurrent DQN [79].…”
Section: Other Drl Algorithmsmentioning
confidence: 99%
“…Other variants of the value-based algorithms are developed to enhance the performance of vanilla DQN algorithm in terms of stability, convergence speed, implementation complexity, sample/learning efficiency, etc. Such variants include prioritized experience replay DQN [75], distributed prioritized experience replay DQN [76], distributional DQN [77], Rainbow DQN [78], and recurrent DQN [79].…”
Section: Other Drl Algorithmsmentioning
confidence: 99%
“…For this motivation, we have also implemented dqn.py, dqn atari.py (Mnih et al, 2013), c51.py, c51 atari.py (Bellemare et al, 2017), apex atari.py (Horgan et al, 2018), ddpg continuous action.py (Lillicrap et al, 2015), td3 continuous action.py (Fujimoto et al, 2018), and sac continuous action.py (Haarnoja et al, 2018).…”
Section: Single-file Implementationsmentioning
confidence: 99%
“…There are some similar properties between prediction error and TD error: 1) they both converge when the policy converges; 2) they are common metrics that show promising results for exploration [12,11,42,16] and exploitation [43][44][45], respectively. These similarities motivate us to study the effects of EMC's each module by comparing with the ablation study using TD error.…”
Section: Similaritiesmentioning
confidence: 99%