Acme: A Research Framework for Distributed Reinforcement Learning

Hoffman, Matthew D.; Shahriari, Bobak; Aslanides, John; Barth-Maron, Gabriel; Momchev, Nikola; Sinopalnikov, Danila; Stańczyk, Piotr; Ramos, Sabela; Raichuk, Anton; Vincent, Damien; Hussenot, Léonard; Dadashi, Robert; Dulac-Arnold, Gabriel; Orsini, Manu; Jacq, Alexis; Ferret, Johan; Vieillard, Nino; Ghasemipour, Seyed Kamyar Seyed; Girgin, Sertan; Pietquin, Olivier; Behbahani, Feryal; Norman, Tamara; Abdolmaleki, Abbas; Cassirer, Albin; Yang, Fan; Baumli, Kate; Henderson, Sarah B.; Friesen, Abe; Haroun, Ruba; Novikov, Alexander; Colmenarejo, Sergio Gómez; Cabi, Serkan; Gülçehre, Çağlar; Paine, Tom Le; Srinivasan, Srivatsan; Cowie, Andrew; Wang, Ziyu; Piot, Bilal; Freitas, Nando de

doi:10.48550/arxiv.2006.00979

Cited by 47 publications

(72 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first part is parallel actors, which are used to interact with environment and generate data; The second component is parallel learners that consume data for policy training; The third and fourth parts are distributed neural network and store of experience to connect the actor and learner. Based on the above framework, a number of advanced distributed reinforcement learning frameworks are developed, and data throughput is largely improved [36], [37], [38]. In Suphx and DouZero, distributed learning is adopted to accelerate RL training, where multiple rollouts are paralleled performed to collect data.…”

Section: Basic Techniques For Suphx and Douzeromentioning

confidence: 99%

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

Yin¹,

Yang²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

With breakthrough of AlphaGo, AI in human-computer game has become a very hot topic attracting researchers all around the world, which usually serves as an effective standard for testing artificial intelligence. Various game AI systems (AIs) have been developed such as Libratus, OpenAI Five and AlphaStar, beating professional human players. In this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games for the intelligent decision making field ; 2) illustrate the mainstream frameworks and techniques for developing professional level AIs; 3) raise the challenges or drawbacks in the current AIs for intelligent decision making; and 4) try to propose future trends in the games and intelligent decision making techniques. Finally, we hope this brief review can provide an introduction for beginners, inspire insights for researchers in the filed of AI in games.

show abstract

Section: Basic Techniques For Suphx and Douzeromentioning

confidence: 99%

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

Yin¹,

Yang²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We train a DQN agent on top of the discretization learned by the AQuaDem framework. The architecture of the Q-network we use is the default LayerNorm architecture from the Q-network of the ACME library [Hoffman et al, 2020], which consists in a hidden layer of size 512 with layer normalization and tanh activation, followed by two hidden layers of sizes 512 and 256 with elu activation. We explored multiple Q-value losses for which we used the Adam optimizer: regular DQN [Mnih et al, 2015], double DQN with experience replay [Van Hasselt et al, 2016, Schaul et al, 2016, and Munchausen DQN [Vieillard et al, 2020]; the latter led to the best performance.…”

Section: D41 Aquadqnmentioning

confidence: 99%

Continuous Control with Action Quantization from Demonstrations

Dadashi¹,

Hussenot²,

Vincent³

et al. 2021

Preprint

View full text Add to dashboard Cite

In Reinforcement Learning (RL), discrete actions, as opposed to continuous actions, result in less complex exploration problems and the immediate computation of the maximum of the action-value function which is central to dynamic programming-based methods. In this paper, we propose a novel method: Action Quantization from Demonstrations (AQuaDem) to learn a discretization of continuous action spaces by leveraging the priors of demonstrations. This dramatically reduces the exploration problem, since the actions faced by the agent not only are in a finite number but also are plausible in light of the demonstrator's behavior. By discretizing the action space we can apply any discrete action deep RL algorithm to the continuous control problem. We evaluate the proposed method on three different setups: RL with demonstrations, RL with play data -demonstrations of a human playing in an environment but not solving any specific task-and Imitation Learning. For all three setups, we only consider human data, which is more challenging than synthetic data. We found that AQuaDem consistently outperforms state-of-the-art continuous control methods, both in terms of performance and sample efficiency. We provide visualizations and videos in the paper's website: https://google-research.github.io/aquadem/.

show abstract

“…To train agents via behavioral cloning [57], we use the open-source Acme [29] to learn a policy from human gameplay data. Specifically, we collected 5 human-human trajectories of length 1200 time steps for each of the 5 layouts, resulting in 60k total environment steps.…”

Section: Implementation Detailsmentioning

confidence: 99%

Collaborating with Humans without Human Data

Strouse¹,

McKee²,

Botvinick³

et al. 2021

Preprint

View full text Add to dashboard Cite

Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.

show abstract

Acme: A Research Framework for Distributed Reinforcement Learning

Cited by 47 publications

References 25 publications

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

AI in Human-computer Gaming: Techniques, Challenges and Opportunities

Continuous Control with Action Quantization from Demonstrations

Collaborating with Humans without Human Data

Contact Info

Product

Resources

About