Actor-Critic Algorithms with Online Feature Adaptation

Prabuchandran, K J; Bhatnagar, Shalabh; Borkar, Vivek S.

doi:10.1145/2868723

Cited by 7 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Limit Theorem As noted by [6], the fact that the deterministic gradient is a limit case of the stochastic gradient enables the standard machinery of policy gradient, such as compatible-function approximation ( [13]), natural gradients ( [14]), on-line feature adaptation ( [15],) and actor-critic ( [16]) to be used with deterministic policies. We show that it holds in our setting.…”

Section: Local Gradients Of Deterministic Policiesmentioning

confidence: 99%

See 1 more Smart Citation

Decentralized Deterministic Multi-Agent Reinforcement Learning

Grosnit¹,

Cai²,

Wynter³

2021

Preprint

View full text Add to dashboard Cite

Zhang, ICML 2018] provided the first decentralized actor-critic algorithm for multi-agent reinforcement learning (MARL) that offers convergence guarantees. In that work, policies are stochastic and are defined on finite action spaces. We extend those results to offer a provablyconvergent decentralized actor-critic algorithm for learning deterministic policies on continuous action spaces. Deterministic policies are important in real-world settings. To handle the lack of exploration inherent in deterministic policies, we consider both off-policy and on-policy settings. We provide the expression of a local deterministic policy gradient, decentralized deterministic actor-critic algorithms and convergence guarantees for linearly-approximated value functions. This work will help enable decentralized MARL in high-dimensional action spaces and pave the way for more widespread use of MARL.

show abstract

Section: Local Gradients Of Deterministic Policiesmentioning

confidence: 99%

“…Consider the following on-policy algorithm. The actor step is based on an expression for ∇ θ i J(µ θ ) in terms of ∇ a i Q θ (see Equation (15) in the Appendix). We approximate the action-value function Q θ using a family of functions Qω : S × A → R parameterized by ω, a column vector in R K .…”

Section: On-policy Deterministic Actor-criticmentioning

confidence: 99%

Decentralized Deterministic Multi-Agent Reinforcement Learning

Grosnit¹,

Cai²,

Wynter³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Actor-critic algorithms [11], [24] are a popular class of RL algorithms that utilise the policy gradient theorem to compute the optimal policy π θ * . Here, the critic estimates the advantage value function (x) parameters and the actor uses it to improves the policy parameters (θ).…”

Section: On-policy Actor-critic Algorithmsmentioning

confidence: 99%

“…However, they explicitly compute/approximate a distance metric using the Fisher information matrix (curvature information) giving rise to complex algorithms. This estimation/approximation of the Fisher information matrix can be entirely avoided (see equation (11) of Section III) if the critic employs a linear function approximator with well-defined features known as compatible features [14].…”

Section: Introductionmentioning

confidence: 99%

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Diddigi¹,

Jain²,

Prabuchandran³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the "off-policy" setting compared to the "on-policy" setting where new data from the policy updates will be utilized in learning. This work proposes an off-policy natural actorcritic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value functions. This often leads to sub-optimal learning in many RL applications. On the other hand, our proposed algorithm utilizes compatible features that enable one to use arbitrary neural networks to approximate the policy and the value function and guarantee convergence to a locally optimal policy. We illustrate the benefit of the proposed off-policy natural gradient algorithm by comparing it with the vanilla gradient actor-critic algorithm on benchmark RL tasks.

show abstract

“…Recently, there have been efforts to automate the feature selection in RL. These researchers include performing a gradient descent on manifold of features in the direction of minimising of mean square Bellman error [6, 7], tuning the parameters of feature using a gradient‐based or a cross‐entropy‐based method [8, 9], constricting features with nonparametric techniques [10–13], and expanding the set of features by using TD error [14]. However, these methods do not consider the tradeoff between the approximation error and estimation error.…”

Section: Introductionmentioning

confidence: 99%

Feature selection in deterministic policy gradient

Song

2020

J. eng.

View full text Add to dashboard Cite

Actor-Critic Algorithms with Online Feature Adaptation

Cited by 7 publications

References 27 publications

Decentralized Deterministic Multi-Agent Reinforcement Learning

Decentralized Deterministic Multi-Agent Reinforcement Learning

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Feature selection in deterministic policy gradient

Contact Info

Product

Resources

About