Rainbow: Combining Improvements in Deep Reinforcement Learning

Hessel, Matteo; Modayil, Joseph; Hasselt, Hado van; Schaul, Tom; Ostrovski, Georg; Dabney, Will; Horgan, Dan; Piot, Bilal; Azar, Mohammad Gheshlaghi; Silver, David

doi:10.48550/arxiv.1710.02298

Cited by 81 publications

(151 citation statements)

References 0 publications

Supporting

Mentioning

146

Contrasting

Unclassified

Order By: Relevance

“…Offpolicy algorithms select actions according to a behavior policy µ that differs from the learning policy π. On-policy algorithms evaluate and improve the learning policy through data sampled from the same policy. RL algorithms can also be divided into value-based methods (Mnih et al 2015;Hessel et al 2017;Horgan et al 2018) and policy-based methods (Espeholt et al 2018;Schmitt, Hessel, and Simonyan 2020). In the value-based methods, agents learn the policy indirectly, where the policy is defined by consulting the learned value function, like -greedy, and a typical GPI learns the value function.…”

Section: Background Reinforcement Learningmentioning

confidence: 99%

“…Human Average Score Baseline As we mentioned above, recent reinforcement learning advances (Badia et al 2020a,b;Kapturowski et al 2018;Ecoffet et al 2019;Schrittwieser et al 2020;Hessel et al 2021Hessel et al , 2017 are seeking agents that can achieve superhuman performance. Thus, we need a metric to intuitively reflect the level of the algorithms compared to human performance.…”

Section: Normalized Scoresmentioning

confidence: 99%

“…In practice, we often use mean HNS or median HNS to show the final performance or generality of an algorithm. Dispute upon whether the mean value or the median value is more representative to show the generality and performance of the algorithms lasts for several years (Mnih et al 2015;Hessel et al 2017;Hafner et al 2020;Hessel et al 2021;Bellemare et al 2013;Machado et al 2018). To avoid any issues that aggregated metrics may have, we advocate calculating both of them in the final results because they serve different purposes, and we could not evaluate any algorithm via a single one of them.…”

Section: Normalized Scoresmentioning

confidence: 99%

“…Firstly, we will discuss some methodological differences in ALE benchmarks found in the literature (Bellemare et al 2013;Machado et al 2018;Badia et al 2020a;Hessel et al 2017). Then, we will introduce the training and evaluation procedures.…”

Section: Human World Records Benchmark For Reinforcement Learning On ...mentioning

confidence: 99%

“…In recent reinforcement learning advances, researchers (Badia et al 2020a;Hessel et al 2021Hessel et al , 2017 are seeking agents that can achieve superhuman performance. Deep Q-Networks (Mnih et al 2015, DQN) was the first algorithm to achieve human-level control in a large number of the Atari 2600 games, measured by human normalized scores (HNS).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Fan¹

2021

Preprint

View full text Add to dashboard Cite

The Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across dozens of Atari 2600 games. ALE offers various challenging problems and has drawn significant attention from the deep reinforcement learning (RL) community. From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE. However, is this the case? In this paper, to explore this problem, we first review the current evaluation metrics in the Atari benchmarks and then reveal that the current evaluation criteria of achieving superhuman performance are inappropriate, which underestimated the human performance relative to what is possible. To handle those problems and promote the development of RL research, we propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency. Furthermore, we summarize the state-of-the-art (SOTA) methods in Atari benchmarks and provide benchmark results over new evaluation metrics based on human world records. We concluded that at least four open challenges hinder RL agents from achieving superhuman performance from those new benchmark results. Finally, we also discuss some promising ways to handle those problems.

show abstract

Section: Background Reinforcement Learningmentioning

confidence: 99%

Section: Normalized Scoresmentioning

confidence: 99%

Section: Normalized Scoresmentioning

confidence: 99%

Section: Human World Records Benchmark For Reinforcement Learning On ...mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Fan¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

AGIL: Learning Attention from Human for Visuomotor Tasks

Zhang

Liu

Zhang

et al. 2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human is allocating visual attention, which can be inferred from their gaze. A wealth of information regarding intelligent decision making is conveyed by human gaze allocation; hence, exploiting such information has the potential to improve the agents' performance. With this motivation, we propose the AGIL (Attention Guided Imitation Learning) framework. We collect high-quality human action and gaze data while playing Atari games in a carefully controlled experimental setting. Using these data, we first train a deep neural network that can predict human gaze positions and visual attention with high accuracy (the gaze network) and then train another network to predict human actions (the policy network). Incorporating the learned attention model from the gaze network into the policy network significantly improves the action prediction accuracy and task performance.

show abstract

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

Weitkamp

Pol

Akata

2019

Communications in Computer and Information Science

View full text Add to dashboard Cite

Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games. However, clearly explaining why a certain action is taken by the agent can be as important as the decision itself. Deep reinforcement learning models, as other deep learning models, tend to be opaque in their decision-making process. In this work, we propose to make deep reinforcement learning more transparent by visualizing the evidence on which the agent bases its decision. In this work, we emphasize the importance of producing a justification for an observed action, which could be applied to a black-box decision agent.

show abstract

Rainbow: Combining Improvements in Deep Reinforcement Learning

Cited by 81 publications

References 0 publications

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

AGIL: Learning Attention from Human for Visuomotor Tasks

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

Contact Info

Product

Resources

About