2020
DOI: 10.48550/arxiv.2010.02193
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mastering Atari with Discrete World Models

Abstract: Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors pu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
139
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 81 publications
(141 citation statements)
references
References 57 publications
(77 reference statements)
1
139
1
Order By: Relevance
“…10, current SOTA algorithms like Agent57 may require more than 52.7 years of game-play to achieve SOTA performance, which revealed its low learning efficiency. As recommended in (Hafner et al 2020), we also argue for high learning efficiency algorithms, and we advocate that 200M training frames (equal to 38 days) are enough for achieving a superhuman agent.…”
Section: Current Challengesmentioning
confidence: 88%
See 4 more Smart Citations
“…10, current SOTA algorithms like Agent57 may require more than 52.7 years of game-play to achieve SOTA performance, which revealed its low learning efficiency. As recommended in (Hafner et al 2020), we also argue for high learning efficiency algorithms, and we advocate that 200M training frames (equal to 38 days) are enough for achieving a superhuman agent.…”
Section: Current Challengesmentioning
confidence: 88%
“…In practice, we often use mean HNS or median HNS to show the final performance or generality of an algorithm. Dispute upon whether the mean value or the median value is more representative to show the generality and performance of the algorithms lasts for several years (Mnih et al 2015;Hessel et al 2017;Hafner et al 2020;Hessel et al 2021;Bellemare et al 2013;Machado et al 2018). To avoid any issues that aggregated metrics may have, we advocate calculating both of them in the final results because they serve different purposes, and we could not evaluate any algorithm via a single one of them.…”
Section: Normalized Scoresmentioning
confidence: 99%
See 3 more Smart Citations