2018
DOI: 10.1109/tnnls.2018.2808203
|View full text |Cite
|
Sign up to set email alerts
|

Actor-Critic Learning Control Based on <inline-formula> <tex-math notation="LaTeX">$\ell_{2}$ </tex-math> </inline-formula>-Regularized Temporal-Difference Prediction With Gradient Correction

Abstract: Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC, studies on how to use recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation have been conducted in recent years. In such contexts, the critic RLS-TD evaluates an unknown mixed policy generated by a series of different actors, but not one fixed policy generated by … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(1 citation statement)
references
References 34 publications
0
1
0
Order By: Relevance
“…Xu et al propose an actor-critic algorithm using Recursive Least-Squares Temporal Difference (λ) as the critic, which is the recursive version of LSTD [20]. There are some other leastsquares-based actor-critic algorithms [21,22]. However, to the best of our knowledge, most of these methods are designed for benchmark tasks with low-dimensional feature vectors (state inputs).…”
Section: Introductionmentioning
confidence: 99%
“…Xu et al propose an actor-critic algorithm using Recursive Least-Squares Temporal Difference (λ) as the critic, which is the recursive version of LSTD [20]. There are some other leastsquares-based actor-critic algorithms [21,22]. However, to the best of our knowledge, most of these methods are designed for benchmark tasks with low-dimensional feature vectors (state inputs).…”
Section: Introductionmentioning
confidence: 99%