Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient

Wu, Dongming; Dong, Xingping; Shen, Jianbing; Hoi, Steven C. H.

doi:10.1109/tnnls.2019.2959129

Cited by 51 publications

(44 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A novel DRL approach, combining TDD [34] and ND, is proposed to address the co-optimization problem. TDD-ND is a model-free, off-policy actor-critic algorithm, in which the triplet critics are used to limit estimation bias, and the exploration ND policy is used to improve the exploration in the algorithm.…”

Section: Proposed Triplet Deep Deterministic Policy Gradient With Exploration Noise Decay Approachmentioning

confidence: 99%

“…The TDD algorithm [34] is an off-line RL algorithm which can be applied to solve the optimization problem with continuous state space as well as continuous actions [35,36]. TDD includes a single actor network (i.e., a deterministic policy network) π φ and its actor target network π φ .…”

Section: Triplet Deep Deterministic Policy Gradient Algorithmmentioning

confidence: 99%

“…A multiphysics-constrained fast-charging strategy was proposed for lithium-ion batteries in [32] based on an environmental perceptive DDPG. However, DDPG is not effective in avoiding overestimation in the actor-critic setting [33,34].…”

Section: Introductionmentioning

confidence: 99%

“…To address the above issues, a novel co-optimization scheme considering the degradation of the battery cell in the BESS is proposed for the multitimescale problem of cooptimizing EA and FR services. A novel deep reinforcement learning (DRL) approach, a triplet deep deterministic policy gradient with exploration noise decay (TDD-ND), is proposed to handle the uncertainty of the real-time electricity prices and frequency regulation signals in the multitimescale co-optimization problem due to the following reasons: (1) TDD-ND does not rely on the knowledge of probability distributions; (2) TDD-ND can be used to solve the problem with continuous action space directly by using deterministic policy in an actor-critic algorithm [34][35][36]; (3) The TDD-ND algorithm takes the weighted action value of triplet critics, which overcomes estimation bias in the deep deterministic policy gradient (DDPG) algorithm and the twin delayed deep deterministic policy gradient 1.…”

Section: Introductionmentioning

confidence: 99%

“…The TDD-ND algorithm is proposed to solve the co-optimization problem. To the best of our knowledge, the TDD algorithm [34] is for the first time used for energy storage.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Co-Optimizing Battery Storage for Energy Arbitrage and Frequency Regulation in Real-Time Markets Using Deep Reinforcement Learning

Miao

Chen

et al. 2021

Energies

View full text Add to dashboard Cite

Battery energy storage systems (BESSs) play a critical role in eliminating uncertainties associated with renewable energy generation, to maintain stability and improve flexibility of power networks. In this paper, a BESS is used to provide energy arbitrage (EA) and frequency regulation (FR) services simultaneously to maximize its total revenue within the physical constraints. The EA and FR actions are taken at different timescales. The multitimescale problem is formulated as two nested Markov decision process (MDP) submodels. The problem is a complex decision-making problem with enormous high-dimensional data and uncertainty (e.g., the price of the electricity). Therefore, a novel co-optimization scheme is proposed to handle the multitimescale problem, and also coordinate EA and FR services. A triplet deep deterministic policy gradient with exploration noise decay (TDD–ND) approach is used to obtain the optimal policy at each timescale. Simulations are conducted with real-time electricity prices and regulation signals data from the American PJM regulation market. The simulation results show that the proposed approach performs better than other studied policies in literature.

show abstract

Section: Proposed Triplet Deep Deterministic Policy Gradient With Exploration Noise Decay Approachmentioning

confidence: 99%

Section: Triplet Deep Deterministic Policy Gradient Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…The TDD-ND algorithm is proposed to solve the co-optimization problem. To the best of our knowledge, the TDD algorithm [34] is for the first time used for energy storage.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Co-Optimizing Battery Storage for Energy Arbitrage and Frequency Regulation in Real-Time Markets Using Deep Reinforcement Learning

Miao

Chen

et al. 2021

Energies

View full text Add to dashboard Cite

show abstract

Slice admission control in 5G cloud radio access network using deep reinforcement learning: A survey

Khani,

Jamali,

Sohrabi

et al. 2024

Int J Communication

View full text Add to dashboard Cite

SummaryThe emergence of 5G networks has increased the demand for network resources, making efficient resource management crucial. Slice admission control (SAC) is a process that governs the creation and allocation of virtualized network environments, known as “network slices,” which can be tailored to meet specific user requirements. However, traditional SAC methods face dynamic and heterogeneous challenges in wireless networks, especially in cloud radio access networks (C‐RANs). To address this issue, machine learning (ML) techniques, particularly deep reinforcement learning (DRL), have been proposed as powerful tools for optimizing SAC. DRL‐based approaches enable SAC systems to learn from previous interactions with the network environment and dynamically adapt to changing network conditions. This review article comprehensively explains the current state‐of‐the‐art DRL‐based SAC, focusing on C‐RANs. The article identifies key challenges and future research directions and highlights the potential benefits of using DRL for SAC, including improved network performance and efficiency. However, deploying these systems in real‐world scenarios presents several challenges and trade‐offs that need to be carefully considered. Further research and development are required to address these challenges and ensure the successful deployment of DRL‐based SAC systems in wireless networks.

show abstract

Sample-Efficiency, Stability and Generalization Analysis for Deep Reinforcement Learning on Robotic Peg-in-Hole Assembly

Deng

Hou

Yang

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient

Cited by 51 publications

References 11 publications

Co-Optimizing Battery Storage for Energy Arbitrage and Frequency Regulation in Real-Time Markets Using Deep Reinforcement Learning

Co-Optimizing Battery Storage for Energy Arbitrage and Frequency Regulation in Real-Time Markets Using Deep Reinforcement Learning

Slice admission control in 5G cloud radio access network using deep reinforcement learning: A survey

Sample-Efficiency, Stability and Generalization Analysis for Deep Reinforcement Learning on Robotic Peg-in-Hole Assembly

Contact Info

Product

Resources

About