Actor-Critic Learning Control Based on &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;$\ell_{2}$ &lt;/tex-math&gt; &lt;/inline-formula&gt;-Regularized Temporal-Difference Prediction With Gradient Correction

Li, Luntong; Li, Dazi; Song, Taek Lyul; Xu, Xin

doi:10.1109/tnnls.2018.2808203

Cited by 11 publications

(1 citation statement)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Xu et al propose an actor-critic algorithm using Recursive Least-Squares Temporal Difference (λ) as the critic, which is the recursive version of LSTD [20]. There are some other leastsquares-based actor-critic algorithms [21,22]. However, to the best of our knowledge, most of these methods are designed for benchmark tasks with low-dimensional feature vectors (state inputs).…”

Section: Introductionmentioning

confidence: 99%

Deep reinforcement learning using least‐squares truncated temporal‐difference

Ren

Lan

et al. 2023

CAAI Trans on Intel Tech

Self Cite

View full text Add to dashboard Cite

Policy evaluation (PE) is a critical sub‐problem in reinforcement learning, which estimates the value function for a given policy and can be used for policy improvement. However, there still exist some limitations in current PE methods, such as low sample efficiency and local convergence, especially on complex tasks. In this study, a novel PE algorithm called Least‐Squares Truncated Temporal‐Difference learning (LST2D) is proposed. In LST2D, an adaptive truncation mechanism is designed, which effectively takes advantage of the fast convergence property of Least‐Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning (TD). Then, two feature pre‐training methods are utilised to improve the approximation ability of LST2D. Furthermore, an Actor‐Critic algorithm based on LST2D and pre‐trained feature representations (ACLPF) is proposed, where LST2D is integrated into the critic network to improve learning‐prediction efficiency. Comprehensive simulation studies were conducted on four robotic tasks, and the corresponding results illustrate the effectiveness of LST2D. The proposed ACLPF algorithm outperformed DQN, ACER and PPO in terms of sample efficiency and stability, which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor‐critic architecture.

show abstract

Section: Introductionmentioning

confidence: 99%

Deep reinforcement learning using least‐squares truncated temporal‐difference

Ren

Lan

et al. 2023

CAAI Trans on Intel Tech

Self Cite

View full text Add to dashboard Cite

show abstract

A Style-Specific Music Composition Neural Network

Jin

Tie

Bai

et al. 2020

Neural Process Lett

View full text Add to dashboard Cite

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Dong,

2024

Neural Process Lett

View full text Add to dashboard Cite

Evolutionary Reinforcement Learning (ERL) has garnered widespread attention in recent years due to its inherent robustness and parallelism. However, the integration of Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) remains relatively rudimentary and lacks dynamism, which can impact the convergence performance of ERL algorithms. In this study, a dynamic adaptive module is introduced to balance the Evolution Strategies (ES) and RL training within ERL. By incorporating elite strategies, this module leverages advantageous individuals to elevate the overall population's performance. Additionally, RL strategy updates often lack guidance from the population. To address this, we incorporate the strategies of the best individuals from the population, providing valuable policy direction. This is achieved through the formulation of a loss function that employs either L1 or L2 regularization to facilitate RL training. The proposed framework is referred to as Adaptive Evolutionary Reinforcement Learning (AERL). The effectiveness of our framework is evaluated by adopting Soft Actor-Critic (SAC) as the RL algorithm and comparing it with other algorithms in the MuJoCo environment. The results underscore the outstanding convergence performance of our proposed Adaptive Evolutionary Soft Actor-Critic (AESAC) algorithm. Furthermore, ablation experiments are conducted to emphasize the necessity of these two improvements. It is worth noting that the enhancements in AESAC are realized at the population level, enabling broader exploration and effectively reducing the risk of falling into local optima.

show abstract

Actor-Critic Learning Control Based on <inline-formula> <tex-math notation="LaTeX">$\ell_{2}$ </tex-math> </inline-formula>-Regularized Temporal-Difference Prediction With Gradient Correction

Cited by 11 publications

References 34 publications

Deep reinforcement learning using least‐squares truncated temporal‐difference

Deep reinforcement learning using least‐squares truncated temporal‐difference

A Style-Specific Music Composition Neural Network

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Contact Info

Product

Resources

About