Model learning actor-critic algorithms: Performance evaluation in a motion control task

Grondman, I.; Buşoniu, Lucian; Babuška, Robert

doi:10.1109/cdc.2012.6426427

Cited by 26 publications

(10 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For each existing local model (3) Calculate model weight w according to (4) (4) If w > activation limit w act (5) Update model parameters using RLS according to ( 6) and (7) (6) Update the corresponding receptive field using (12) and (14) (7) End (8) End Usually, the receptive field activation limit is set as w act � 0.001. is parameter represents the weight limit for a local model to be updated according to the new data and to be included in the output estimation through a weighted average with another activated model. e pruning limit is usually set as w prun � 0.7, which represents the highest acceptable overlap of neighbouring receptive fields.…”

Section: Receptive Field Weighted Regressionmentioning

confidence: 99%

“…On the other hand, local regression is a wellestablished modelling approach for model-based RF agents where the model is composed of local linear models, offering fast and computationally cheap approximation. ere are several variants of local modelling methods; comprehensive examples of grid-based local linear model structure and data-based local linear regression (LLR) are described in [11,12], respectively. Even though the use of local regression techniques within RL has been researched in the past, it was mainly based on simple, memory-based approximation methods such as the LLR, which is thoroughly described and examined in [13,14], and more complex incremental methods such as the receptive field weighted regression (RFWR) [15,16] or locally weighted projection regression (LWPR) [17] were omitted, with the exception of [18], where the RFWR algorithm was used as a critic approximator.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

et al. 2021

View full text Add to dashboard Cite

Reinforcement learning (RL) agents can learn to control a nonlinear system without using a model of the system. However, having a model brings benefits, mainly in terms of a reduced number of unsuccessful trials before achieving acceptable control performance. Several modelling approaches have been used in the RL domain, such as neural networks, local linear regression, or Gaussian processes. In this article, we focus on techniques that have not been used much so far: symbolic regression (SR), based on genetic programming and local modelling. Using measured data, symbolic regression yields a nonlinear, continuous-time analytic model. We benchmark two state-of-the-art methods, SNGP (single-node genetic programming) and MGGP (multigene genetic programming), against a standard incremental local regression method called RFWR (receptive field weighted regression). We have introduced modifications to the RFWR algorithm to better suit the low-dimensional continuous-time systems we are mostly dealing with. The benchmark is a nonlinear, dynamic magnetic manipulation system. The results show that using the RL framework and a suitable approximation method, it is possible to design a stable controller of such a complex system without the necessity of any haphazard learning. While all of the approximation methods were successful, MGGP achieved the best results at the cost of higher computational complexity. Index Terms–AI-based methods, local linear regression, nonlinear systems, magnetic manipulation, model learning for control, optimal control, reinforcement learning, symbolic regression.

show abstract

Section: Receptive Field Weighted Regressionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

et al. 2021

View full text Add to dashboard Cite

show abstract

“…To accelerate learning convergence, Q -learning and Sarsa algorithms are modified using eligibility traces (Sutton and Barto, 1998; Grondman et al , 2012a, 2012b), which offer better way to assign credits for visited states. The eligibility trace for the pair ( z , f d ) at time step k is denoted with e k ( z , f d ): …”

Section: Position/force Control Using Reinforcement Learning In Unknown Environmentmentioning

confidence: 99%

“…Recent studies show that RL can calculate the optimal impedance model time-varying environments (Wang et al , 2015). The optimal critical algorithm (Grondman et al , 2012a, 2012b, 2011) is applied to obtain the optimum impedance strength. The parameters of the impedance model are estimated by the exponential weighted minimum square (Astrom and Wittenmark, 1989), but it needs a parameterization model for the impedance parameters (Chih and Huang, 2004).…”

Section: Introductionmentioning

confidence: 99%

Position/force control of robot manipulators using reinforcement learning

Perrusquía

Soria

2019

View full text Add to dashboard Cite

Purpose The position/force control of the robot needs the parameters of the impedance model and generates the desired position from the contact force in the environment. When the environment is unknown, learning algorithms are needed to estimate both the desired force and the parameters of the impedance model. Design/methodology/approach In this paper, the authors use reinforcement learning to learn only the desired force, then they use proportional-integral-derivative admittance control to generate the desired position. The results of the experiment are presented to verify their approach. Findings The position error is minimized without knowing the environment or the impedance parameters. Another advantage of this simplified position/force control is that the transformation of the Cartesian space to the joint space by inverse kinematics is avoided by the feedback control mechanism. The stability of the closed-loop system is proven. Originality/value The position error is minimized without knowing the environment or the impedance parameters. The stability of the closed-loop system is proven.

show abstract

“…Actor-critic (AC) algorithm was introduced in [ 20 ] for the first time; many variants which approximated the value function and the policy by linear function approximation have been widely used in continuous-time systems since then [ 21 – 23 ]. By combining model learning and AC, Grondman et al [ 24 ] proposed an improved learning method called Model Learning Actor-Critic (MLAC) which approximates the value function, the policy, and the process model by LLR. In MLAC, the gradient of the next state with respect to the current action is computed for updating the policy gradient, with the goal of improving the convergence rate of the whole algorithm.…”

Section: Introduction and Related Workmentioning

confidence: 99%

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Zhong

Liu

2016

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.

show abstract

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Cited by 26 publications

References 9 publications

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Position/force control of robot manipulators using reinforcement learning

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Contact Info

Product

Resources

About