Real-Time Optimal Power Flow Using Twin Delayed Deep Deterministic Policy Gradient Algorithm

Woo, Jong Ha; Wu, Lei; Park, Jong-Bae; Roh, Jae Hyung

doi:10.1109/access.2020.3041007

Cited by 32 publications

(8 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, we focus on integrating our proposed training method with one of the state-of-the-art DRL algorithms, Twin Delayed Deep Deterministic policy gradient algorithm (TD3) [14]. We chose TD3 as it is a recent proposed algorithm which offers good performance in many tasks [27,45,56,55,22]. However, our proposed approach can be easily merged into other DRL algorithms as well and due to limited resources, we consider these alternatives outside of the scope of this paper.…”

Section: Proposed Methodsmentioning

confidence: 99%

Dynamic Sparse Training for Deep Reinforcement Learning

Sokar¹,

Elena²,

Mocanu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning has achieved significant success in many decisionmaking tasks in various fields. However, it requires a large training time of dense neural networks to obtain a good performance. This hinders its applicability on lowresource devices where memory and computation are strictly constrained. In a step towards enabling deep reinforcement learning agents to be applied to low-resource devices, in this work, we propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch. We adopt the evolution principles of dynamic sparse training in the reinforcement learning paradigm and introduce a training algorithm that optimizes the sparse topology and the weight values jointly to dynamically fit the incoming data. Our approach is easy to be integrated into existing deep reinforcement learning algorithms and has many favorable advantages. First, it allows for significant compression of the network size which reduces the memory and computation costs substantially. This would accelerate not only the agent inference but also its training process. Second, it speeds up the agent learning process and allows for reducing the number of required training steps. Third, it can achieve higher performance than training the dense counterpart network. We evaluate our approach on OpenAI gym continuous control tasks 1 . The experimental results show the effectiveness of our approach in achieving higher performance than one of the state-of-art baselines with a 50% reduction in the network size and floating-point operations (FLOPs). Moreover, our proposed approach can reach the same performance achieved by the dense network with a 40-50% reduction in the number of training steps.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Dynamic Sparse Training for Deep Reinforcement Learning

Sokar¹,

Elena²,

Mocanu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Cost & Load Balance Mixed/NA [197], [198], [199] Cost & Comfort [200] Other/Mixed Residential A2C [201] HVAC, Fans, WH Cost Commercial A3C [202] P2P Trading Mixed/NA TD3 [203] HVAC, Fans, WH [204] Cost & Comfort [205] Other/Mixed Residential…”

Section: Referencementioning

confidence: 99%

Reinforcement Learning: Theory and Applications in HEMS

Alani¹,

Das²

2022

Preprint

View full text Add to dashboard Cite

The twin capabilities of learning from experience and learning at higher levels of abstraction, set reinforcement learning apart from other areas of machine learning and (within the broader context) all of artificial intelligence. It allows algorithmic agents to replace human beings in the real world, including in homes and buildings, in application domains that had hitherto been considered to be beyond today’s capabilities. This goal, specifically aimed at home energy automation that forms the backdrop of this article, which surveys the use of deep reinforcement learning in various HEMS applications. The article provides an overview of generic reinforcement learning. This is followed with discussions on the state-of-the-art methods for value based, policy gradient, and actor-critic methods in deep reinforcement learning. In order to make published literature in reinforcement learning more accessible to HEMS researchers, verbal descriptions are accompanied with explanatory figures as well as mathematical expressions using the same terminology as the machine learning community. Next, a detailed survey of how reinforcement learning is used in different HEMS domains is described. The survey also considers what kind of reinforcement learning algorithms are used in each HEMS application. The survey suggests that this research is still in its infancy.

show abstract

“…Changes in operation status of the power system's elements, such as nodal voltage, are constructed by a sequence of real or complex numbers as a discrete time-domain signal. Rather than DQN, DDPG is appropriate for real-time changes in a discrete-time domain because DQN updates the neural network using a total reward in one episode unit, while DDPG updates the reward for each step [36]. Since the current and voltage data in discrete time-domain are changed with continuous form, unlike discrete movement such as top-bottom-left-right, the RL components can operate over continuous action spaces by using DDPG.…”

Section: Ddpg Algorithm For Controller Designmentioning

confidence: 99%

“…The observation of RL system plays a vital role since it is a core of the agent, which enables the agent to receive the results from the action and the environments changes [36]. In this paper, the observations represent that 𝑠 = [𝑉𝑎𝑐 𝑟𝑒𝑓 , 𝑉𝑎𝑐, 𝑉𝑎𝑐 𝑑𝑖𝑓 ,…”

Section: The Proposed Approach To Control D-statcommentioning

confidence: 99%

D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm in the Distribution System

Woo

Lee

et al. 2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

The high penetration level of renewable energy in large-scale power systems could adversely affect power quality, such as voltage stability and harmonic pollution. This paper assesses the impacts of Distribution Static Compensator (D-STATCOM), one of the Flexible AC Transmission System (FACTS) devices, on power quality of 4.16kV-level distribution systems via transient and steady-state analysis. Carrierbased Pulse Width Modulation (PWM) control in D-STATCOM generates d-q axis current reference via the PID (Proportional-Integral-Differential) controller to control d-q axis current and voltage. A new control method, via the Deep Deterministic Policy Gradient (DDPG) algorithm-based reinforcement learning (RL), is studied to create a new d-q axis current reference applying to the voltage control, which can improve voltage stability and transient response and derive fast convergence of current and voltage at the D-STATCOM bus. The real-time simulations on an IEEE 13-bus system show that the proposed approach can better control the D-STATCOM than the conventional control methods for enhancing voltage stability and transient performance.

show abstract

Real-Time Optimal Power Flow Using Twin Delayed Deep Deterministic Policy Gradient Algorithm

Cited by 32 publications

References 22 publications

Dynamic Sparse Training for Deep Reinforcement Learning

Dynamic Sparse Training for Deep Reinforcement Learning

Reinforcement Learning: Theory and Applications in HEMS

D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm in the Distribution System

Contact Info

Product

Resources

About