2018
DOI: 10.48550/arxiv.1802.01561
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Abstract: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in singlemachine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
300
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 144 publications
(302 citation statements)
references
References 15 publications
1
300
1
Order By: Relevance
“…Despite our demonstration of the value of building in strong inductive biases, we do not mean to suggest that AI approaches with less built-in structure could not be developed to achieve similar performance. On the contrary, we hope that our work will inspire other AI researchers to set this degree of rapid learning and generalization as their target, and to explore how to incorporate -whether through deep model-based learning (42)(43)(44)(45), meta-learning (19,(72)(73)(74), simulated evolution (75), or hybrid neuro-symbolic architectures (76-80) -inductive biases like those we have built into our model. We suspect that any system that eventually matches humanlevel learning in games or any space of complex novel tasks will exhibit, or at least greatly benefit from, a decomposition of the problem into learning and planning, and from inductive bi-…”
Section: Towards More Human-like Learning In Aimentioning
confidence: 99%
“…Despite our demonstration of the value of building in strong inductive biases, we do not mean to suggest that AI approaches with less built-in structure could not be developed to achieve similar performance. On the contrary, we hope that our work will inspire other AI researchers to set this degree of rapid learning and generalization as their target, and to explore how to incorporate -whether through deep model-based learning (42)(43)(44)(45), meta-learning (19,(72)(73)(74), simulated evolution (75), or hybrid neuro-symbolic architectures (76-80) -inductive biases like those we have built into our model. We suspect that any system that eventually matches humanlevel learning in games or any space of complex novel tasks will exhibit, or at least greatly benefit from, a decomposition of the problem into learning and planning, and from inductive bi-…”
Section: Towards More Human-like Learning In Aimentioning
confidence: 99%
“…The first part is parallel actors, which are used to interact with environment and generate data; The second component is parallel learners that consume data for policy training; The third and fourth parts are distributed neural network and store of experience to connect the actor and learner. Based on the above framework, a number of advanced distributed reinforcement learning frameworks are developed, and data throughput is largely improved [36], [37], [38]. In Suphx and DouZero, distributed learning is adopted to accelerate RL training, where multiple rollouts are paralleled performed to collect data.…”
Section: Basic Techniques For Suphx and Douzeromentioning
confidence: 99%
“…Distributed RL architectures typically comprise a large number of roll-out and trainer workers operating in tandem. The roll-out workers repeatedly step through the environment to generate roll-outs in parallel, using the actions sampled from the policy models on the roll-out workers (8)(9)(10)(11) or provided by the trainer worker (12). Roll-out workers typically use CPU machines, and occasionally, GPU machines for richer environments.…”
Section: Distributed Rl Systemsmentioning
confidence: 99%