2012 IEEE 51st IEEE Conference on Decision and Control (CDC) 2012
DOI: 10.1109/cdc.2012.6426427
|View full text |Cite
|
Sign up to set email alerts
|

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(10 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…For each existing local model (3) Calculate model weight w according to (4) (4) If w > activation limit w act (5) Update model parameters using RLS according to ( 6) and (7) (6) Update the corresponding receptive field using (12) and (14) (7) End (8) End Usually, the receptive field activation limit is set as w act � 0.001. is parameter represents the weight limit for a local model to be updated according to the new data and to be included in the output estimation through a weighted average with another activated model. e pruning limit is usually set as w prun � 0.7, which represents the highest acceptable overlap of neighbouring receptive fields.…”
Section: Receptive Field Weighted Regressionmentioning
confidence: 99%
See 1 more Smart Citation
“…For each existing local model (3) Calculate model weight w according to (4) (4) If w > activation limit w act (5) Update model parameters using RLS according to ( 6) and (7) (6) Update the corresponding receptive field using (12) and (14) (7) End (8) End Usually, the receptive field activation limit is set as w act � 0.001. is parameter represents the weight limit for a local model to be updated according to the new data and to be included in the output estimation through a weighted average with another activated model. e pruning limit is usually set as w prun � 0.7, which represents the highest acceptable overlap of neighbouring receptive fields.…”
Section: Receptive Field Weighted Regressionmentioning
confidence: 99%
“…On the other hand, local regression is a wellestablished modelling approach for model-based RF agents where the model is composed of local linear models, offering fast and computationally cheap approximation. ere are several variants of local modelling methods; comprehensive examples of grid-based local linear model structure and data-based local linear regression (LLR) are described in [11,12], respectively. Even though the use of local regression techniques within RL has been researched in the past, it was mainly based on simple, memory-based approximation methods such as the LLR, which is thoroughly described and examined in [13,14], and more complex incremental methods such as the receptive field weighted regression (RFWR) [15,16] or locally weighted projection regression (LWPR) [17] were omitted, with the exception of [18], where the RFWR algorithm was used as a critic approximator.…”
Section: Introductionmentioning
confidence: 99%
“…To accelerate learning convergence, Q -learning and Sarsa algorithms are modified using eligibility traces (Sutton and Barto, 1998; Grondman et al , 2012a, 2012b), which offer better way to assign credits for visited states. The eligibility trace for the pair ( z , f d ) at time step k is denoted with e k ( z , f d ): …”
Section: Position/force Control Using Reinforcement Learning In Unknown Environmentmentioning
confidence: 99%
“…Recent studies show that RL can calculate the optimal impedance model time-varying environments (Wang et al , 2015). The optimal critical algorithm (Grondman et al , 2012a, 2012b, 2011) is applied to obtain the optimum impedance strength. The parameters of the impedance model are estimated by the exponential weighted minimum square (Astrom and Wittenmark, 1989), but it needs a parameterization model for the impedance parameters (Chih and Huang, 2004).…”
Section: Introductionmentioning
confidence: 99%
“…Actor-critic (AC) algorithm was introduced in [ 20 ] for the first time; many variants which approximated the value function and the policy by linear function approximation have been widely used in continuous-time systems since then [ 21 23 ]. By combining model learning and AC, Grondman et al [ 24 ] proposed an improved learning method called Model Learning Actor-Critic (MLAC) which approximates the value function, the policy, and the process model by LLR. In MLAC, the gradient of the next state with respect to the current action is computed for updating the policy gradient, with the goal of improving the convergence rate of the whole algorithm.…”
Section: Introduction and Related Workmentioning
confidence: 99%