QUOTA: The Quantile Option Architecture for Reinforcement Learning

Zhang, Shangtong; Yao, Hengshuai

doi:10.1609/aaai.v33i01.33015797

Cited by 14 publications

(12 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We perform our algorithms on Cart Pole, Mountain Car, Breakout and Qbert games. We followed the procedure in [7,22]. All the experimental settings, including parameters, are identical to the distributional RL baselines implemented by [21,5].…”

Section: Methodsmentioning

confidence: 99%

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Sun¹,

Liu²,

Zhao³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we propose State-Noisy Markov Decision Process (SN-MDP) in the tabular case to incorporate both random and adversarial state observation noises, in which the contraction of both expectation-based and distributional Bellman operators is derived. Beyond SN-MDP with the function approximation, we theoretically characterize the bounded gradient norm of histogram-based distributional loss, accounting for the better training robustness of distribution RL. We also provide stricter convergence conditions of the Temporal-Difference (TD) learning under more flexible state noises, as well as the sensitivity analysis by the leverage of influence function. Finally, extensive experiments on the suite of games show that distributional RL enjoys better training robustness compared with its expectation-based counterpart across various state observation noises.

show abstract

Section: Methodsmentioning

confidence: 99%

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Sun¹,

Liu²,

Zhao³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The performance of our proposed QuaDRED-SMPC framework is evaluated in RotorS [40], a UAV software simulator. Based on the benchmark [11], [22], the parameters of our proposed framework are summarized in Table I.…”

Section: Numerical Examplementioning

confidence: 99%

“…In princple, they provide more complete and richer value-distribution information to enable a more stable learning process [17]. Previous distributional RL algorithms parameterize the policy value distribution in different ways, including canonical return atoms [17], the expectiles [19], the moments [20], and the quantiles [21], [22]. The quantile approach is especially suitable for autonomous UAV trajectory tracking due to its risk-sensitive policy optimization.…”

Section: Introduction Accurate Trajectory Tracking For Autonomous Unm...mentioning

confidence: 99%

“…1) Aerodynamic Disturbance Estimator: a Quantileapproximation-based Distributional Reinforceddisturbance-estimator (QuaDRED), described by Algorithm 2, is proposed for aerodynamic disturbance estimation. QuaDRED builds upon prior QR-DQN [21] and QUOTA [22] insofar as QuaDRED is a quantile-approximated distributional RL which uses a set of quantiles to approximate the full value distribution. In Section IV-A, theoretical guarantees on convergence of the QuaDRED are provided based on policy evaluation (Proposition 3) and policy improvement (Proposition 4), respectively.…”

Section: Introduction Accurate Trajectory Tracking For Autonomous Unm...mentioning

confidence: 99%

See 1 more Smart Citation

Interpretable Stochastic Model Predictive Control using Distributional Reinforced Estimation for Quadrotor Tracking Systems

Wang¹,

O’Keeffe²,

Qian³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper presents a novel trajectory tracker for autonomous quadrotor navigation in dynamic and complex environments. The proposed framework integrates a distributional Reinforcement Learning (RL) estimator for unknown aerodynamic effects into a Stochastic Model Predictive Controller (SMPC) for trajectory tracking. Aerodynamic effects derived from drag forces and moment variations are difficult to model directly and accurately. Most current quadrotor tracking systems therefore treat them as simple 'disturbances' in conventional control approaches. We propose Quantile-approximationbased Distributional Reinforced-disturbance-estimator, an aerodynamic disturbance estimator, to accurately identify disturbances, i.e., uncertainties between the true and estimated values of aerodynamic effects. Simplified Affine Disturbance Feedback is employed for control parameterization to guarantee convexity, which we then integrate with a SMPC to achieve sufficient and non-conservative control signals. We demonstrate our system to improve the cumulative tracking errors by at least 66% with unknown and diverse aerodynamic forces compared with recent state-of-the-art. Concerning traditional Reinforcement Learning's non-interpretability, we provide convergence and stability guarantees of Distributional RL and SMPC, respectively, with non-zero mean disturbances.

show abstract

“…Deep RL has recently achieved significant improvements in a variety of challenging tasks, including game playing [2,3,4] and robust navigation [5]. A flurry of state-of-the-art algorithms have been proposed, including Deep Q-Learning (DQN) [2] and variants such as Double-DQN [6], Dueling-DQN [7], Deep Deterministic Policy Gradient (DDPG) [8], Soft Actor-Critic [9] and distributional RL algorithms [10,11,12], all of which have successfully solved end-to-end decision-making problems such as playing Atari games. However, the slow convergence and sample inefficiency of RL algorithms still hinders the progress of RL research, particularly in high-dimensional state spaces where deep neural network are used as function approximators, making learning in real physical worlds impractical.…”

Section: Introductionmentioning

confidence: 99%

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Sun¹,

Liu²,

Zhao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

show abstract

QUOTA: The Quantile Option Architecture for Reinforcement Learning

Cited by 14 publications

References 22 publications

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations

Interpretable Stochastic Model Predictive Control using Distributional Reinforced Estimation for Quadrotor Tracking Systems

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Contact Info

Product

Resources

About