2018
DOI: 10.1609/aaai.v32i1.11791
|View full text |Cite
|
Sign up to set email alerts
|

Distributional Reinforcement Learning With Quantile Regression

Abstract: In reinforcement learning (RL), an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over ret… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
146
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 280 publications
(197 citation statements)
references
References 16 publications
1
146
0
2
Order By: Relevance
“…Asymmetric NRL prediction errors also provide a computational mechanism for recent theories of distributional reinforcement learning. In contrast to standard RL approaches, where models learn a single scalar quantity representing the mean reward, distributional RL models learn about the full distribution of possible rewards [33,34]. The critical difference in distributional approaches is a diversity of RPE channels, with differing degrees of optimism about outcomes, which learn varying predictions about future reward.…”
Section: A Computational Mechanism For Distributional Rlmentioning
confidence: 99%
“…Asymmetric NRL prediction errors also provide a computational mechanism for recent theories of distributional reinforcement learning. In contrast to standard RL approaches, where models learn a single scalar quantity representing the mean reward, distributional RL models learn about the full distribution of possible rewards [33,34]. The critical difference in distributional approaches is a diversity of RPE channels, with differing degrees of optimism about outcomes, which learn varying predictions about future reward.…”
Section: A Computational Mechanism For Distributional Rlmentioning
confidence: 99%
“…Later, a distributional DQN [56] and a quantile regression DQN [57] were proposed using stochastic policy and distributed training, and they were combined as a 'Rainbow DQN' by David Silver [58] in 2017.…”
Section: B Methodology Of Mfrlmentioning
confidence: 99%
“…The Arcade Learning Environment (ALE) is used as a standard baseline to compare and evaluate new deep reinforcement learning algorithms as they are developed (Hasselt, Guez, and Silver 2016;Mnih et al 2016;Wang et al 2016;Fedus et al 2020;Rowland et al 2019;Mnih et al 2015;Hessel et al 2018;Kapturowski et al 2019;Dabney et al 2018;Dabney, Ostrovski, and Barreto 2021;Xu et al 2020;Schmitt, Hessel, and Simonyan 2020). As a result, any systematic issues with these environments are of critical importance.…”
Section: Framework For Investigatingmentioning
confidence: 99%