2019
DOI: 10.1609/aaai.v33i01.33015797
|View full text |Cite
|
Sign up to set email alerts
|

QUOTA: The Quantile Option Architecture for Reinforcement Learning

Abstract: In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL). In QUOTA, decision making is based on quantiles of a value distribution, not only the mean. QUOTA provides a new dimension for exploration via making use of both optimism and pessimism of a value distribution. We demonstrate the performance advantage of QUOTA in both challenging video games and physical robot simulators. * Work done during an internship at H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…We perform our algorithms on Cart Pole, Mountain Car, Breakout and Qbert games. We followed the procedure in [7,22]. All the experimental settings, including parameters, are identical to the distributional RL baselines implemented by [21,5].…”
Section: Methodsmentioning
confidence: 99%
“…We perform our algorithms on Cart Pole, Mountain Car, Breakout and Qbert games. We followed the procedure in [7,22]. All the experimental settings, including parameters, are identical to the distributional RL baselines implemented by [21,5].…”
Section: Methodsmentioning
confidence: 99%
“…The performance of our proposed QuaDRED-SMPC framework is evaluated in RotorS [40], a UAV software simulator. Based on the benchmark [11], [22], the parameters of our proposed framework are summarized in Table I.…”
Section: Numerical Examplementioning
confidence: 99%
“…In princple, they provide more complete and richer value-distribution information to enable a more stable learning process [17]. Previous distributional RL algorithms parameterize the policy value distribution in different ways, including canonical return atoms [17], the expectiles [19], the moments [20], and the quantiles [21], [22]. The quantile approach is especially suitable for autonomous UAV trajectory tracking due to its risk-sensitive policy optimization.…”
Section: Introduction Accurate Trajectory Tracking For Autonomous Unm...mentioning
confidence: 99%
See 1 more Smart Citation
“…Deep RL has recently achieved significant improvements in a variety of challenging tasks, including game playing [2,3,4] and robust navigation [5]. A flurry of state-of-the-art algorithms have been proposed, including Deep Q-Learning (DQN) [2] and variants such as Double-DQN [6], Dueling-DQN [7], Deep Deterministic Policy Gradient (DDPG) [8], Soft Actor-Critic [9] and distributional RL algorithms [10,11,12], all of which have successfully solved end-to-end decision-making problems such as playing Atari games. However, the slow convergence and sample inefficiency of RL algorithms still hinders the progress of RL research, particularly in high-dimensional state spaces where deep neural network are used as function approximators, making learning in real physical worlds impractical.…”
Section: Introductionmentioning
confidence: 99%