2022
DOI: 10.1007/s11063-022-10748-2
|View full text |Cite
|
Sign up to set email alerts
|

Off-Policy: Model-Free Optimal Synchronization Control for Complex Dynamical Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…Usually, the key issue of the optimal control for nonlinear systems is the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is generally hard to solve analytically since it is a first-order nonlinear partial differential equation. To address the issue, Werbos introduced adaptive dynamic programming (ADP) theory which provided a way to compute the approximate value of the HJB equation online and has attracted a great deal of attention from researchers [14][15][16]. The ADP-based methods typically employ neural networks to build actor-critic architecture, which involves a critic network to approximate performance index function and an actor network to approximate control policy [17,18].…”
Section: Introductionmentioning
confidence: 99%
“…Usually, the key issue of the optimal control for nonlinear systems is the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is generally hard to solve analytically since it is a first-order nonlinear partial differential equation. To address the issue, Werbos introduced adaptive dynamic programming (ADP) theory which provided a way to compute the approximate value of the HJB equation online and has attracted a great deal of attention from researchers [14][15][16]. The ADP-based methods typically employ neural networks to build actor-critic architecture, which involves a critic network to approximate performance index function and an actor network to approximate control policy [17,18].…”
Section: Introductionmentioning
confidence: 99%
“…Off-policy learning, where an agent learns its current target policy while following a different behavior policy, provides an agent with opportunities to accumulate a wealth of knowledge by learning about the effects of different behavioral policies and underpins many practical implementations of reinforcement learning (RL, [3]) [4,5,6]. In many cases, we would prefer off-policy learning to some extent to learn about the greedy policy while exploring [7,8], to enable data to be generated from one behavior policy to evaluate multiple target policies simultaneously [9,10], to improve data efficiency via experience replay [11], or to correct data discrepancies introduced by the distributed computation [12].…”
Section: Introductionmentioning
confidence: 99%
“…Usually, the key issue of the optimal control for nonlinear systems is the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is generally hard to solve analytically since it is a first-order nonlinear partial differential equation. To address the issue, Werbos introduced adaptive dynamic programming (ADP) theory which provided a way to compute the approximate value of the HJB equation online and has attracted a great deal of attention from researchers [14][15][16]. The ADP-based methods typically employ neural networks to build actor-critic architecture, which involves a critic network to approximate performance index function and an actor network to approximate control policy [17,18].…”
Section: Introductionmentioning
confidence: 99%