A unified view of entropy-regularized Markov decision processes

Neu, Gergely; Jönsson, Anders; Gómez, Vicenç

doi:10.48550/arxiv.1705.07798

Cited by 73 publications

(122 citation statements)

References 14 publications

Supporting

Mentioning

121

Contrasting

Order By: Relevance

“…This similarity in addressing the inherent geometry of the problem is noticed by a line of recent work including Neu et al (2017); Geist et al (2019); ; Tomar et al (2020); Lan (2021), and the analysis techniques in MD methods have been adapted to the PG setting. The connection was first built explicitly in Neu et al (2017).…”

Section: Background and Related Workmentioning

confidence: 86%

Approximate Newton policy gradient algorithms

Gupta

et al. 2021

Preprint

View full text Add to dashboard Cite

Policy gradient algorithms have been widely applied to reinforcement learning (RL) problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. In this paper, we propose a quasi-Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient (NPG) algorithm. For other entropy functions, this method results in brand new policy gradient algorithms. We provide a simple proof that all these algorithms enjoy the Newton-type quadratic convergence near the optimal policy. Using synthetic and industrial-scale examples, we demonstrate that the proposed quasi-Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.

show abstract

Section: Background and Related Workmentioning

confidence: 86%

Approximate Newton policy gradient algorithms

Gupta

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…is a mirror descent process which guarantees the convergence. With this property, similar to Neu et al (2017) and Wang et al (2019), we can use the following iterative process,…”

Section: Target Policymentioning

confidence: 95%

“…The former is usually a deterministic policy (Sutton and Barto, 2018) which is not flexible enough for unknown situations, while the latter is a policy with non-zero probability for all actions which may be dangerous in some scenarios. Neu et al (2017) analyzed the entropy regularization method from several views. They revealed a more general form of regularization which is actually divergence regularization and showed entropy regularization is just a special case of divergence regularization.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Divergence-Regularized Multi-Agent Actor-Critic

2021

Preprint

View full text Add to dashboard Cite

Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process. Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Mathematically, we derive the update rule of DMAC which is naturally off-policy, guarantees a monotonic policy improvement and is not biased by the regularization. DMAC is a flexible framework and can be combined with many existing MARL algorithms. We evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and empirically show that DMAC substantially improves the performance of existing MARL algorithms.

show abstract

“…The analyses therein heavily exploit the contraction properties of the Bellman optimality condition, making their extensions to the stochastic setting, with only stochastic first-order information, unclear without additional assumptions [9]. Connections between PG methods and the classical mirror descent algorithm in optimization [2,21,22] have also been established and exploited to establish convergence of the former methods (e.g., TRPO [27,28], REPS [24,25]). Until recently, [16] proposes policy mirror descent methods and its stochastic variants for general convex regularizers, and establishes linear convergence in both deterministic and stochastic setting.…”

mentioning

confidence: 99%

Block Policy Mirror Descent

Lan¹,

Li²,

Zhao³

2022

Preprint

View full text Add to dashboard Cite

In this paper, we present a new class of policy gradient (PG) methods, namely the block policy mirror descent (BPMD) methods for solving a class of regularized reinforcement learning (RL) problems with (strongly) convex regularizers. Compared to the traditional PG methods with batch update rule, which visit and update the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, BPMD methods achieve fast linear convergence to the global optimality. We further extend BPMD methods to the stochastic setting, by utilizing stochastic firstorder information constructed from samples. We establish O(1/ ) (resp. O(1/ 2 )) sample complexity for the strongly convex (resp. non-strongly convex) regularizers, with different procedures for constructing the stochastic first-order information, where denotes the target accuracy. To the best of our knowledge, this is the first time that block coordinate descent methods have been developed and analyzed for policy optimization in reinforcement learning.

show abstract

A unified view of entropy-regularized Markov decision processes

Cited by 73 publications

References 14 publications

Approximate Newton policy gradient algorithms

Approximate Newton policy gradient algorithms

Divergence-Regularized Multi-Agent Actor-Critic

Block Policy Mirror Descent

Contact Info

Product

Resources

About