2019
DOI: 10.1002/rnc.4515
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic intermittent Q‐learning–based model‐free suboptimal co‐design of ‐stabilization

Abstract: Summary This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, a Q‐learning algorithm with two actors and a single critic structure is developed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7

Relationship

4
3

Authors

Journals

citations
Cited by 35 publications
(9 citation statements)
references
References 38 publications
0
9
0
Order By: Relevance
“…One of such algorithms, named off-line policy iteration algorithm, can be introduced as in Algorithm 1. By iteratively solving the Lyapunov equations (29) and (30), which are linear for P i 1 , P i 2 , and updating K i , L i by (31) and (32), as shown at the bottom of this page, the pair (P i 1 , P i 2 ) converges to the solutions (P * 1 , P * 2 ) to the CARE (16) and (17) asymptotically [13].…”
Section: B Coupled Algebraic Riccati Equationsmentioning
confidence: 99%
“…One of such algorithms, named off-line policy iteration algorithm, can be introduced as in Algorithm 1. By iteratively solving the Lyapunov equations (29) and (30), which are linear for P i 1 , P i 2 , and updating K i , L i by (31) and (32), as shown at the bottom of this page, the pair (P i 1 , P i 2 ) converges to the solutions (P * 1 , P * 2 ) to the CARE (16) and (17) asymptotically [13].…”
Section: B Coupled Algebraic Riccati Equationsmentioning
confidence: 99%
“…In a CPS, the plant, the actuators, and the sensors are distributed located, where the signal transmission channels are vulnerable to system uncertainty and adversarial environment. Therefore, it is important to investigate the imperfection for modeling due to the existence of communication delay, 9,10 packet dropout, 11,12 external disturbance input, 13 false‐data injection 14,15 . There are mainly two types of techniques for the modeling and analysis of the imperfection in the CPS.…”
Section: Introductionmentioning
confidence: 99%
“…However, most LMI‐based methods need to solve complex linear matrix inequalities and inevitably require the system model information, which cannot be obtained accurately in many practical applications. In addition, similar to H ∞ control, H ∞ tracking control problem can be converted into a two‐person zero‐sum game problem according to game theory, where the controller is a minimizing player and the disturbance is a maximizing player . In this process, finding the saddle point of zero‐sum game by solving game algebraic Riccati equation (GARE) is the key to solving the H ∞ tracking control of a linear system …”
Section: Introductionmentioning
confidence: 99%
“…In addition, similar to H ∞ control, H ∞ tracking control problem can be converted into a two-person zero-sum game problem according to game theory, where the controller is a minimizing player and the disturbance is a maximizing player. 12,13 In this process, finding the saddle point of zero-sum game by solving game algebraic Riccati equation (GARE) is the key to solving the H ∞ tracking control of a linear system. 14 Reinforcement learning (RL), which emphasizes the use of environment-based feedback to modify behavior, is originally used by Werbos to solve optimal regulator problem in control field.…”
Section: Introductionmentioning
confidence: 99%