I. Grondman scite author profile

Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actorcritic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.

show abstract

Efficient Model Learning Methods for Actor–Critic Control

Grondman

Vaandrager²,

Buşoniu

et al. 2012

IEEE Trans. Syst., Man, Cybern. B

101

View full text Add to dashboard Cite

We propose two new actor-critic algorithms for reinforcement learning. Both algorithms use local linear regression (LLR) to learn approximations of the functions involved. A crucial feature of the algorithms is that they also learn a process model, and this, in combination with LLR, provides an efficient policy update for faster learning. The first algorithm uses a novel model-based update rule for the actor parameters. The second algorithm does not use an explicit actor but learns a reference model which represents a desired behavior, from which desired control actions can be calculated using the inverse of the learned process model. The two novel methods and a standard actor-critic algorithm are applied to the pendulum swing-up problem, in which the novel methods achieve faster learning than the standard algorithm.

show abstract

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Grondman¹,

Buşoniu²,

Babuška³

2012

View full text Add to dashboard Cite

Comparison of model-free and model-based methods for time optimal hit control of a badminton robot

et al. 2014

View full text Add to dashboard Cite

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

2014

View full text Add to dashboard Cite

Model-free and model-based time-optimal control of a badminton robot

Liu

Depraetere²,

Pinte³

et al. 2013

View full text Add to dashboard Cite

Actor-Critic Control with Reference Model Learning

Grondman

Busoniu

Babuška

et al. 2011

IFAC Proceedings Volumes

View full text Add to dashboard Cite

Online Model Learning Algorithms for Actor-Critic Control

Grondman¹

2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

I. Grondman

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Efficient Model Learning Methods for Actor–Critic Control

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Comparison of model-free and model-based methods for time optimal hit control of a badminton robot

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

Model-free and model-based time-optimal control of a badminton robot

Actor-Critic Control with Reference Model Learning

Online Model Learning Algorithms for Actor-Critic Control

Contact Info

Product

Resources

About