IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Espeholt, Lasse; Soyer, Hubert; Munos, Rémi; Simonyan, Karen; Mnih, Volodymir; Ward, Tom; Doron, Yotam; Firoiu, Vlad; Harley, Tim; Dunning, Iain; Legg, Shane; Kavukcuoglu, Koray

doi:10.48550/arxiv.1802.01561

Cited by 144 publications

(302 citation statements)

References 15 publications

Supporting

Mentioning

300

Contrasting

Order By: Relevance

“…Despite our demonstration of the value of building in strong inductive biases, we do not mean to suggest that AI approaches with less built-in structure could not be developed to achieve similar performance. On the contrary, we hope that our work will inspire other AI researchers to set this degree of rapid learning and generalization as their target, and to explore how to incorporate -whether through deep model-based learning (42)(43)(44)(45), meta-learning (19,(72)(73)(74), simulated evolution (75), or hybrid neuro-symbolic architectures (76-80) -inductive biases like those we have built into our model. We suspect that any system that eventually matches humanlevel learning in games or any space of complex novel tasks will exhibit, or at least greatly benefit from, a decomposition of the problem into learning and planning, and from inductive bi-…”

Section: Towards More Human-like Learning In Aimentioning

confidence: 99%

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Tsividis¹,

Loula²,

Burga³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world's oldest board games and many classic video games, but they require vast quantities of experience to learn successfully -none of today's algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly strong form of model-based RL which we call Theory-Based Reinforcement Learning, because it uses human-like intuitive theories -rich, abstract, causal models of physical objects, intentional agents, and their interactions -to explore and model an environment, and plan effectively to achieve task goals. We instantiate the approach in a video game playing agent called EMPA (the Exploring, Modeling, and Planning Agent), which performs Bayesian inference to learn

show abstract

Section: Towards More Human-like Learning In Aimentioning

confidence: 99%

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Tsividis¹,

Loula²,

Burga³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The first part is parallel actors, which are used to interact with environment and generate data; The second component is parallel learners that consume data for policy training; The third and fourth parts are distributed neural network and store of experience to connect the actor and learner. Based on the above framework, a number of advanced distributed reinforcement learning frameworks are developed, and data throughput is largely improved [36], [37], [38]. In Suphx and DouZero, distributed learning is adopted to accelerate RL training, where multiple rollouts are paralleled performed to collect data.…”

Section: Basic Techniques For Suphx and Douzeromentioning

confidence: 99%

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

Yin¹,

Yang²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

With breakthrough of AlphaGo, AI in human-computer game has become a very hot topic attracting researchers all around the world, which usually serves as an effective standard for testing artificial intelligence. Various game AI systems (AIs) have been developed such as Libratus, OpenAI Five and AlphaStar, beating professional human players. In this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games for the intelligent decision making field ; 2) illustrate the mainstream frameworks and techniques for developing professional level AIs; 3) raise the challenges or drawbacks in the current AIs for intelligent decision making; and 4) try to propose future trends in the games and intelligent decision making techniques. Finally, we hope this brief review can provide an introduction for beginners, inspire insights for researchers in the filed of AI in games.

show abstract

“…Distributed RL architectures typically comprise a large number of roll-out and trainer workers operating in tandem. The roll-out workers repeatedly step through the environment to generate roll-outs in parallel, using the actions sampled from the policy models on the roll-out workers (8)(9)(10)(11) or provided by the trainer worker (12). Roll-out workers typically use CPU machines, and occasionally, GPU machines for richer environments.…”

Section: Distributed Rl Systemsmentioning

confidence: 99%

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Lan,

Srinivasa,

Wang

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (RL) is a powerful framework to train decision-making models in complex dynamical environments. However, RL can be slow as it learns through repeated interaction with a simulation of the environment. Accelerating RL requires both algorithmic and engineering innovations. In particular, there are key systems engineering bottlenecks when using RL in complex environments that feature multiple agents or highdimensional state, observation, or action spaces, for example. We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-toend multi-agent RL on a single GPU (Graphics Processing Unit), building on PyCUDA and PyTorch. Using the extreme parallelization capability of GPUs, WarpDrive enables ordersof-magnitude faster RL compared to common implementations that blend CPU simulations and GPU models. Our design runs simulations and the agents in each simulation in parallel. It eliminates data copying between CPU and GPU. It also uses a single simulation data store on the GPU that is safely updated in-place. Together, this allows the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience. For example, WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100× higher throughput compared to a CPU implementation) in a benchmark Tag simulation. WarpDrive provides a lightweight Python interface and environment wrappers to simplify usage and promote flexibility and extensions. As such, WarpDrive provides a framework for building high-throughput RL systems. *: TL and SS contributed equally. TL and SS designed and developed WarpDrive. TL built the core CUDA library, including the DataManager and FunctionManagers. SS built the environment wrapper and the training pipeline. SS and TL wrote the simulation examples and unit tests. SS and TL ran RL experiments. SZ, SS, and TL drafted the paper. SZ conceived and directed the project. Code is available at https://www.github.com/ salesforce/warp-drive. We thank Alexander Trott for valuable comments on this paper.The name WarpDrive is inspired by the science fiction concept of a fictional superluminal spacecraft propulsion system. Moreover, at the time of writing, a warp is a group of 32 threads that are executing at the same time in (certain) GPUs.

show abstract

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Cited by 144 publications

References 15 publications

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Contact Info

Product

Resources

About