Learning and Querying Fast Generative Models for Reinforcement Learning

Buesing, Lars; Weber, Théophane; Racanière, Sébastien; Eslami, S. M. Ali; Rezende, Danilo Jimenez; Reichert, David P.; Romano, Pietro; Besse, Frederic; Gregor, Karol; Hassabis, Demis; Wierstra, Daan

doi:10.48550/arxiv.1802.03006

Cited by 38 publications

(58 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vector-action is the same with VPT-TOB but replaces our action representation with a vector embedding as in [47]. This embedding is concatenated with the intermediate features to produce the final predicted image.…”

Section: Baseline Methodsmentioning

confidence: 99%

Visual Perspective Taking for Opponent Behavior Modeling

Chen¹,

Hu²,

Kwiatkowski³

et al. 2021

Preprint

View full text Add to dashboard Cite

In order to engage in complex social interaction, humans learn at a young age to infer what others see and cannot see from a different point-of-view, and learn to predict others' plans and behaviors. These abilities have been mostly lacking in robots, sometimes making them appear awkward and socially inept. Here we propose an end-to-end long-term visual prediction framework for robots to begin to acquire both these critical cognitive skills, known as Visual Perspective Taking (VPT) and Theory of Behavior (TOB). We demonstrate our approach in the context of visual hide-and-seek -a game that represents a cognitive milestone in human development. Unlike traditional visual predictive model that generates new frames from immediate past frames, our agent can directly predict to multiple future timestamps (25 s), extrapolating by 175% beyond the training horizon. We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into realworld multi-agent activities. Our website is at http://www. cs.columbia.edu/ ˜bchen/vpttob/.

show abstract

Section: Baseline Methodsmentioning

confidence: 99%

Visual Perspective Taking for Opponent Behavior Modeling

Chen¹,

Hu²,

Kwiatkowski³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The architecture of latent models, or world models, is elaborate. The dynamics model typically includes an observation model, a representation model, a transition model, and a value or reward model (Karl et al, 2016;Buesing et al, 2018;Doerr et al, 2018). The task of the observation model is to reduce the high-dimensional world into a lower-dimensional world, to allow more efficient planning.…”

Section: Latent Modelsmentioning

confidence: 99%

“…Here the work by Hafner et al (2018Hafner et al ( , 2019 on the PlaNet and Dreamer systems is noteworthy including the application of their work back to Atari (Hafner et al, 2020), which achieved human-level performance. PlaNet uses a Recurrent state space model (RSSM) that consists of a transition model, an observation model, a variational encoder and a reward model (Karl et al, 2016;Buesing et al, 2018;Doerr et al, 2018). Based on these models a Model-predictive control agent is used to adapt its plan, replanning each step (Richards, 2005).…”

Section: Latent Modelsmentioning

confidence: 99%

“…Related to this, imagination-augmented agents (I2A) has been designed as a fully end-to-end differentiable architecture for modelbased imagination and model-free reinforcement learning (Weber et al, 2017). It consists of an LSTM-based encoder (Chiappa et al, 2017;Buesing et al, 2018), a ConvLSTM rollout module, and a standard CNN-based model-free path. The policy improvement algorithm is A3C.…”

Section: End-to-end Learning and Planningmentioning

confidence: 99%

See 1 more Smart Citation

High-Accuracy Model-Based Reinforcement Learning, a Survey

Plaat¹,

Kosters²,

Preuß³

2021

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems from game playing and robotics have been solved with deep model-free methods. Unfortunately, the sample complexity of modelfree methods is often high. To reduce the number of environment samples, model-based reinforcement learning creates an explicit model of the environment dynamics.Achieving high model accuracy is a challenge in high-dimensional problems. In recent years, a diverse landscape of model-based methods has been introduced to improve model accuracy, using methods such as uncertainty modeling, model-predictive control, latent models, and end-to-end learning and planning. Some of these methods succeed in achieving high accuracy at low sample complexity, most do so either in a robotics or in a games context. In this paper, we survey these methods; we explain in detail how they work and what their strengths and weaknesses are. We conclude with a research agenda for future work to make the methods more robust and more widely applicable to other applications.

show abstract

“…This has motivated the combination of probabilistic modeling with deep reinforcement learning. A particularly active research direction in MBRL has been to use variational inference methods for training state space models (SSM) (Buesing et al 2018;Hafner et al 2019b,a;Okada, Kosaka, and Taniguchi 2020;Lee et al 2020). The SSMs (Kalman 1960) are a class of sequential latent variable models that consider a hidden (i.e.…”

Section: State Space Models In Model-based Reinforcement Learningmentioning

confidence: 99%

Predictive Control Using Learned State Space Models via Rolling Horizon Evolution

Ovalle,

Lucas

2021

Preprint

View full text Add to dashboard Cite

A large part of the interest in model-based reinforcement learning derives from the potential utility to acquire a forward model capable of strategic long term decision making. Assuming that an agent succeeds in learning a useful predictive model, it still requires a mechanism to harness it to generate and select among competing simulated plans. In this paper, we explore this theme combining evolutionary algorithmic planning techniques with models learned via deep learning and variational inference. We demonstrate the approach with an agent that reliably performs online planning in a set of visual navigation tasks.

show abstract

Learning and Querying Fast Generative Models for Reinforcement Learning

Cited by 38 publications

References 0 publications

Visual Perspective Taking for Opponent Behavior Modeling

Visual Perspective Taking for Opponent Behavior Modeling

High-Accuracy Model-Based Reinforcement Learning, a Survey

Predictive Control Using Learned State Space Models via Rolling Horizon Evolution

Contact Info

Product

Resources

About