2021
DOI: 10.1109/access.2021.3099904
|View full text |Cite
|
Sign up to set email alerts
|

Power System Load Frequency Active Disturbance Rejection Control via Reinforcement Learning-Based Memetic Particle Swarm Optimization

Abstract: Load frequency control (LFC) is necessary to guarantee the safe operation of power systems. Aiming at the frequency and power stability problems caused by load disturbances in interconnected power systems, active disturbance rejection control (ADRC) was designed. There are eight parameters that need to be adjusted for an ADRC, which are challenging to adjust manually, thus limiting the development of this approach in industrial applications. Regardless of the theory or application, there is still no unified an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…The agent of Q‐VMD‐RLG continuously learns to find the optimal policy π based on action value function Q ( S ( t ), a ( t )) and continuously updates action a ( t ) to maximize the value of Q [72–76], thus obtaining the optimal action‐value function Q * ( S ( t ), a ( t )) [77, 78], which is updated as shown in Equation (16). Q(Sfalse(tfalse),afalse(tfalse))=false(1αfalse)Qfalse(S(t),a(t)false)+αRfalse(tgoodbreak+1false)+0.16emγmaxQfalse(S(t+1),a(t+1)false)$$\begin{eqnarray} Q( {S(t),a(t)} ) &=& ( {1 - \alpha } )Q( {S(t),a(t)} ) + \alpha \left[ R(t + 1)\right.\nonumber\\ && + \left.\,\gamma \max Q( {S(t + 1),a(t + 1)} )\right ]\end{eqnarray}$$where Q ( S ( t ), a ( t )) are the values of the action value function at moment t , S ( t ) and a t denote state and action executed by the agent at moment t .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The agent of Q‐VMD‐RLG continuously learns to find the optimal policy π based on action value function Q ( S ( t ), a ( t )) and continuously updates action a ( t ) to maximize the value of Q [72–76], thus obtaining the optimal action‐value function Q * ( S ( t ), a ( t )) [77, 78], which is updated as shown in Equation (16). Q(Sfalse(tfalse),afalse(tfalse))=false(1αfalse)Qfalse(S(t),a(t)false)+αRfalse(tgoodbreak+1false)+0.16emγmaxQfalse(S(t+1),a(t+1)false)$$\begin{eqnarray} Q( {S(t),a(t)} ) &=& ( {1 - \alpha } )Q( {S(t),a(t)} ) + \alpha \left[ R(t + 1)\right.\nonumber\\ && + \left.\,\gamma \max Q( {S(t + 1),a(t + 1)} )\right ]\end{eqnarray}$$where Q ( S ( t ), a ( t )) are the values of the action value function at moment t , S ( t ) and a t denote state and action executed by the agent at moment t .…”
Section: Methodsmentioning
confidence: 99%
“…The agent of Q-VMD-RLG continuously learns to find the optimal policy π based on action value function Q(S(t), a(t)) and continuously updates action a(t) to maximize the value of Q [72][73][74][75][76], thus obtaining the optimal action-value function Q*(S(t), a(t)) [77,78], which is updated as shown in Equation (16).…”
Section: Markov Dynamic Decision Process Of Q-vmd-rlg Modelmentioning
confidence: 99%
“…Based on the individual values of η m (t), the j th instance of η(t) is denoted as η j (t). Consequently, the attacker has two choices: m measurements carry the real data y m (t) or compromised data y ma (t): (7) where C m denotes the m th row of:…”
Section: System and Threat Modelmentioning
confidence: 99%
“…Meanwhile, power systems operate in the presence of disturbances [6]. Disturbances caused by the load variation from their forecasted value have been considered in [7]. However, unlike the existing literature, the effect of the disturbances has been taken into consideration during the design procedure of control and attack strategies.…”
Section: Introductionmentioning
confidence: 99%