2018
DOI: 10.48550/arxiv.1802.03006
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning and Querying Fast Generative Models for Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
55
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(58 citation statements)
references
References 0 publications
1
55
2
Order By: Relevance
“…Vector-action is the same with VPT-TOB but replaces our action representation with a vector embedding as in [47]. This embedding is concatenated with the intermediate features to produce the final predicted image.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…Vector-action is the same with VPT-TOB but replaces our action representation with a vector embedding as in [47]. This embedding is concatenated with the intermediate features to produce the final predicted image.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…The architecture of latent models, or world models, is elaborate. The dynamics model typically includes an observation model, a representation model, a transition model, and a value or reward model (Karl et al, 2016;Buesing et al, 2018;Doerr et al, 2018). The task of the observation model is to reduce the high-dimensional world into a lower-dimensional world, to allow more efficient planning.…”
Section: Latent Modelsmentioning
confidence: 99%
“…Here the work by Hafner et al (2018Hafner et al ( , 2019 on the PlaNet and Dreamer systems is noteworthy including the application of their work back to Atari (Hafner et al, 2020), which achieved human-level performance. PlaNet uses a Recurrent state space model (RSSM) that consists of a transition model, an observation model, a variational encoder and a reward model (Karl et al, 2016;Buesing et al, 2018;Doerr et al, 2018). Based on these models a Model-predictive control agent is used to adapt its plan, replanning each step (Richards, 2005).…”
Section: Latent Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…This has motivated the combination of probabilistic modeling with deep reinforcement learning. A particularly active research direction in MBRL has been to use variational inference methods for training state space models (SSM) (Buesing et al 2018;Hafner et al 2019b,a;Okada, Kosaka, and Taniguchi 2020;Lee et al 2020). The SSMs (Kalman 1960) are a class of sequential latent variable models that consider a hidden (i.e.…”
Section: State Space Models In Model-based Reinforcement Learningmentioning
confidence: 99%