2017
DOI: 10.48550/arxiv.1710.02298
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rainbow: Combining Improvements in Deep Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
146
1
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 81 publications
(151 citation statements)
references
References 0 publications
3
146
1
1
Order By: Relevance
“…Offpolicy algorithms select actions according to a behavior policy µ that differs from the learning policy π. On-policy algorithms evaluate and improve the learning policy through data sampled from the same policy. RL algorithms can also be divided into value-based methods (Mnih et al 2015;Hessel et al 2017;Horgan et al 2018) and policy-based methods (Espeholt et al 2018;Schmitt, Hessel, and Simonyan 2020). In the value-based methods, agents learn the policy indirectly, where the policy is defined by consulting the learned value function, like -greedy, and a typical GPI learns the value function.…”
Section: Background Reinforcement Learningmentioning
confidence: 99%
See 4 more Smart Citations
“…Offpolicy algorithms select actions according to a behavior policy µ that differs from the learning policy π. On-policy algorithms evaluate and improve the learning policy through data sampled from the same policy. RL algorithms can also be divided into value-based methods (Mnih et al 2015;Hessel et al 2017;Horgan et al 2018) and policy-based methods (Espeholt et al 2018;Schmitt, Hessel, and Simonyan 2020). In the value-based methods, agents learn the policy indirectly, where the policy is defined by consulting the learned value function, like -greedy, and a typical GPI learns the value function.…”
Section: Background Reinforcement Learningmentioning
confidence: 99%
“…Human Average Score Baseline As we mentioned above, recent reinforcement learning advances (Badia et al 2020a,b;Kapturowski et al 2018;Ecoffet et al 2019;Schrittwieser et al 2020;Hessel et al 2021Hessel et al , 2017 are seeking agents that can achieve superhuman performance. Thus, we need a metric to intuitively reflect the level of the algorithms compared to human performance.…”
Section: Normalized Scoresmentioning
confidence: 99%
See 3 more Smart Citations