“…To reduce the revenue loss of wind power producer, [92] adopts the A3C algorithm for the strategic bidding of wind power producer when participating in the energy and reserve market. Reference [93] formulates the joint bidding problem of energy volume and price as an MDP, which is then solved by the DDPG algorithm. NN is used to learn a response function and extract the state transfer pattern from historical data in a supervised learning manner.…”
With the growing integration of distributed energy resources (DERs), flexible loads, and other emerging technologies, there are increasing complexities and uncertainties for modern power and energy systems. This brings great challenges to the operation and control. Besides, with the deployment of advanced sensor and smart meters, a large number of data are generated, which brings opportunities for novel data-driven methods to deal with complicated operation and control issues. Among them, reinforcement learning (RL) is one of the most widely promoted methods for control and optimization problems. This paper provides a comprehensive literature review of RL in terms of basic ideas, various types of algorithms, and their applications in power and energy systems. The challenges and further works are also discussed.
“…To reduce the revenue loss of wind power producer, [92] adopts the A3C algorithm for the strategic bidding of wind power producer when participating in the energy and reserve market. Reference [93] formulates the joint bidding problem of energy volume and price as an MDP, which is then solved by the DDPG algorithm. NN is used to learn a response function and extract the state transfer pattern from historical data in a supervised learning manner.…”
With the growing integration of distributed energy resources (DERs), flexible loads, and other emerging technologies, there are increasing complexities and uncertainties for modern power and energy systems. This brings great challenges to the operation and control. Besides, with the deployment of advanced sensor and smart meters, a large number of data are generated, which brings opportunities for novel data-driven methods to deal with complicated operation and control issues. Among them, reinforcement learning (RL) is one of the most widely promoted methods for control and optimization problems. This paper provides a comprehensive literature review of RL in terms of basic ideas, various types of algorithms, and their applications in power and energy systems. The challenges and further works are also discussed.
“…In (16), the chain rule is applied to calculate the gradient of the action value to the weights of the actor network.…”
Section: B Ddpg Algorithm For Continuous Controlmentioning
confidence: 99%
“…In power systems, the potentials of implementing deep RL for demand-side energy management and electric vehicle charging/discharging scheduling are shown in [14], [15]. The deep deterministic policy gradient (DDPG) algorithm is applied to solve the bidding problem of a load serving entity and GEN-COs in [16], [17].…”
In this paper, a day-ahead electricity market bidding problem with multiple strategic generation company (GEN-CO) bidders is studied. The problem is formulated as a Markov game model, where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies. Considering unobservable information in the problem, a model-free and data-driven approach, known as multi-agent deep deterministic policy gradient (MADDPG), is applied for approximating the Nash equilibrium (NE) in the above Markov game. The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks. The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case. Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient (DDPG) demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains. In addition, the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency, which is feasible for real-world applications.
“…In [23], a model-free method based on DRL with continuous action space was introduced to replace the traditional linear controller for load frequency control. Reference [27] applied the deep deterministic policy gradient algorithm to solve the joint bidding and pricing problems of the load-serving entity. Similarly, the autonomous voltage control strategies were proposed by [24] based on DQN and DDPG to support grid operators in making effective control actions.…”
Section: B Drl-based Power System Managementmentioning
confidence: 99%
“…Unlike traditional reinforcement learning, the DRL algorithms use powerful deep neuron networks to approximate their value function (such as Q-table), enabling automatic high-dimensional feature extraction and end-toend learning. Recently, the advantages of DRL were recognized by the community and some attempts were made to leverage DRL in various applications for electrical grid, including operational control [21]- [24], electricity market [25], [26], demand response [27] and energy management [28]. Although these applications presented advantageous results in their respective fields, several challenges were encountered.…”
The model uncertainties and the heterogeneous energy states burden the effective aggregation of electric vehicles (EVs), especially coupling with the real-time frequency dynamic of the electrical grid. Integrating the advantages of deep learning and reinforcement learning, deep reinforcement learning shows its potential to relieve this challenge, where an intelligent agent fully considers the individual state of charge (SOC) difference of EV and the grid state to optimize the aggregation performance. However, existing policies of deep reinforcement learning usually provide deterministic and certain actions, and it is difficult to deal with the increasing uncertainties and randomness in modern electrical systems. In this paper, a probability-based management strategy is proposed with continuous action space based on the deep reinforcement learning, which provides fine-grained energy management and addresses the time-varying dynamics from EVs and electrical grid simultaneously. Moreover, an optimization based on the proximal policy is further introduced to clip the policy upgradation speed to enhance the training stability. The effectiveness of proposed energy management structure and policy optimization strategy are verified on various scenarios and uncertainties, which demonstrates advantageous performance in the SOC management and frequency maintenance. Besides the performance merits, the training procedure is also presented revealing the evolution reason for the proposed approach. INDEX TERMS State of charge (SOC), deep reinforcement learning, hybrid framework, electric vehicle aggregator, multiple-input and multiple-output. Nomenclature
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.