Actor–Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation

Li, Luntong; Li, Dazi; Song, Taek Lyul; Xu, Xin

doi:10.1109/tnnls.2020.2981377

Cited by 15 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, the impact of λ , β , θ , the number of SUs and the number of channels on the performance of the proposed scheme is analyzed. Then the performance of the proposed spectrum access algorithm Feilin is compared with DQN+RC [18], Q-learning [11], PG+RDA [17], and MPQ-L+DPG [38]. All results in the following scenarios are the average of 1000 independent experiments.…”

Section: A Experimental Setupmentioning

confidence: 99%

“…Based on the deterministic policy [15], the deep deterministic policy gradient (DDPG) was developed for continuous control [16]. To obtain the solution of the minimization problem by learning stochastic and deterministic approximate optimal policies, a regularized dual-averaging policy gradient (RDA-PG) scheme was proposed [17]. However, the learning-based methods mentioned above depend on the centralized model training, which increases transmission overhead and degrades the real-time performance.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Federated Deep Reinforcement Learning-Based Spectrum Access Algorithm With Warranty Contract in Intelligent Transportation Systems

Zhu

Liu

et al. 2023

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Cognitive radio (CR) provides an effective solution to meet the huge bandwidth requirements in intelligent transportation systems (ITS), which enables secondary users (SUs) to access the idle spectrum of the primary users (PUs). However, the high mobility of users and real-time service requirements resulting in the additional transmission collisions and interference, which degrades the spectrum access rate and the quality of service (QoS) of users in ITS. This paper proposes a spectrum access algorithm (Feilin) based on federated deep reinforcement learning (FDRL) to improve spectrum access rate, which maximizes the QoS reward function with considering the hybrid benefits of delay, transmission power and utility of SUs. To guarantees the utility of SUs, the warranty contract is designed for SUs to obtain compensation for data transmission failure, which promotes SUs to compete for more spectrum resources. To meet the real-time requirements and improve QoS in ITS, a spectrum access model called FDQN-W is proposed based on federated deep Q-network (DQN), which adopts the asynchronous federated weighted learning algorithm (AFWLA) to share and update the weights of DQN in multiple agents to decrease time cost and accelerate the convergence. Detailed simulation results show that, in the multiuser scenario, compared with the existing methods, the proposed algorithm Feilin increases the spectrum access success rate by 15.1%, and reduces the collision rate with SUs and the collision rate with PUs by 46.4% and 6.8%, respectively.

show abstract

Section: A Experimental Setupmentioning

confidence: 99%

mentioning

confidence: 99%

Federated Deep Reinforcement Learning-Based Spectrum Access Algorithm With Warranty Contract in Intelligent Transportation Systems

Zhu

Liu

et al. 2023

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

show abstract

“…Making sophisticated statistical methods applicable to the existing market data and to provide better data analysis for investor decision-making have become a hot topic. In particular, in the fields of index tracking, portfolio management, and risk hedging, broad application platforms for feature selection methods arise [4], [5]. Index tracking is a significant investment strategy [2], [3]in fund management that aims to replicate the movements of a specific market index.…”

Section: Introduction a Related Workmentioning

confidence: 99%

LSTM-DGMDH: High-Dimensional Index Tracking Based on LSTM and Adaptive Deep Evolutionary GMDH Neural Network

Tong,

Liu,

Liu

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Stock index is an indicator that describes the changes in the total price level of the stock market, and it is susceptible to many dynamic factors, with such characteristics as high dimension, uncertainty, non-linearity, time delay, complexity, etc., resulting in abnormal and missing values in stock index data, which will lead to instability or unreliability of the stock index tracking model. In order to solve these problems, we take the historical stock index as the input, model the internal dynamic changes of features, and learn the change rule. Firstly, we introduce an attention mechanism, that is, to assign different weights to the implicit state of the long short term memory network (LSTM) through mapping weights and learning parameters. We further propose a stock index data preprocessing model of the LSTM based on the attention mechanism. Secondly, the group method of data handling type neural networks (GMDH-NN) is a selforganizing data mining technology, which is especially suitable for modeling complex systems. So we choose a discrete form of Kolmogorov-Gabor (K − G) polynomial of the first-order as the reference function of GMDH-NN to establish the general relationship between input and output variables. We further present a deep evolutionary GMDH polynomial neural network (DGMDH) to perform stock index tracking. Moreover, for a high-dimensional stock index dataset, the traditional external criterion can no longer meet the needs of reality, so we propose a tracking error external criterion (TEEC) for stock indices, which is based on the difference between allocation yield and target yield. The TEEC provides better information for selecting the optimal complex DGMDH model. Our experiments clearly show the effectiveness of our methodology.

show abstract

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Dong,

2024

Neural Process Lett

View full text Add to dashboard Cite

Evolutionary Reinforcement Learning (ERL) has garnered widespread attention in recent years due to its inherent robustness and parallelism. However, the integration of Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) remains relatively rudimentary and lacks dynamism, which can impact the convergence performance of ERL algorithms. In this study, a dynamic adaptive module is introduced to balance the Evolution Strategies (ES) and RL training within ERL. By incorporating elite strategies, this module leverages advantageous individuals to elevate the overall population's performance. Additionally, RL strategy updates often lack guidance from the population. To address this, we incorporate the strategies of the best individuals from the population, providing valuable policy direction. This is achieved through the formulation of a loss function that employs either L1 or L2 regularization to facilitate RL training. The proposed framework is referred to as Adaptive Evolutionary Reinforcement Learning (AERL). The effectiveness of our framework is evaluated by adopting Soft Actor-Critic (SAC) as the RL algorithm and comparing it with other algorithms in the MuJoCo environment. The results underscore the outstanding convergence performance of our proposed Adaptive Evolutionary Soft Actor-Critic (AESAC) algorithm. Furthermore, ablation experiments are conducted to emphasize the necessity of these two improvements. It is worth noting that the enhancements in AESAC are realized at the population level, enabling broader exploration and effectively reducing the risk of falling into local optima.

show abstract

Actor–Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation

Cited by 15 publications

References 22 publications

Federated Deep Reinforcement Learning-Based Spectrum Access Algorithm With Warranty Contract in Intelligent Transportation Systems

Federated Deep Reinforcement Learning-Based Spectrum Access Algorithm With Warranty Contract in Intelligent Transportation Systems

LSTM-DGMDH: High-Dimensional Index Tracking Based on LSTM and Adaptive Deep Evolutionary GMDH Neural Network

Adaptive Evolutionary Reinforcement Learning with Policy Direction

Contact Info

Product

Resources

About