Reinforcement Q-Learning Algorithm for H<sub>∞</sub> Tracking Control of Unknown Discrete-Time Linear Systems

Peng, Yunjian; Chen, Qian; Sun, Weijie

doi:10.1109/tsmc.2019.2957000

Cited by 47 publications

(60 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, from (22), (19) and (24), it can conclude that both the matrix H and the matrix N satisfy (15). However, it has been shown in [30] that there is a unique matrix H for (15), thus the contradiction is generated.…”

Section: B Output Feedback Control Designmentioning

confidence: 97%

“…Proof: We first show that there is a unique matrixH satisfying (22) and the optimal control policies and the worst disturbances obtained by (23) based on the matrixH are also unique. If there are two different matrices, matrixH and matrixN , such that (22) holds, then we can obtain matrix H and the following matrix N according to (19).…”

Section: B Output Feedback Control Designmentioning

confidence: 99%

See 1 more Smart Citation

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Xiao

2020

IEEE Access

View full text Add to dashboard Cite

In this paper, a data-driven optimal control method based on adaptive dynamic programming and game theory is presented for solving the output feedback solutions of the H ∞ control problem for linear discrete-time systems with multiple players subject to multi-source disturbances. We first transform the H ∞ control problem into a multi-player game problem following the theoretical solutions according to game theory. Since the system state may not be measurable, we derive the output feedback based control policies and disturbances through mathematical operations. Considering the advantages of offpolicy reinforcement learning (RL) over on-policy RL, a novel off-policy game Q-learning algorithm dealing with mixed competition and cooperation among players is developed, such that the H ∞ control problem can be finally solved for linear multi-player systems without the knowledge of system dynamics. Moreover, rigorous proofs of algorithm convergence and unbiasedness of solutions are presented. Finally, simulation results demonstrated the effectiveness of the proposed method.

show abstract

Section: B Output Feedback Control Designmentioning

confidence: 97%

Section: B Output Feedback Control Designmentioning

confidence: 99%

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Xiao

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…is property can result in the difficulties for implementing the Q-learning technique of the continuous-time system. Additionally, compared with the data-driven method in [48], this work provides a neural network-based technique to avoid the Kronecker product in estimating the actor/critic term. e actor/critic-based approaches have been discussed in [49,50] for nonlinear affine systems using residual error δ hjb .…”

Section: Arl-based Control Design For Independent Jointsmentioning

confidence: 99%

“…where the coefficients in (54) are mentioned in (55), (56), and (35). Let us consider the control structure (Figure 2), ( 10) with ARL-based control scheme (46), ( 10), (39), and (30), the associated adjusting mechanisms (35) and (33) for the actual controller, and constraint force control vector (48); then, (1) the actor-critic weight errors W a and W c are UUB; (2) the tracking effectiveness of not only 1) is also UUB; (3) the tracking of constraint force coefficient vector λ and the remaining terms of joint variables' vector η 2n � [ξ (2n− m) , p m ] is also UUB.…”

Section: Convergence and Stability Analysismentioning

confidence: 99%

Adaptive Reinforcement Learning-Enhanced Motion/Force Control Strategy for Multirobot Systems

Nam

Khanh

Nguyen

2021

Mathematical Problems in Engineering

View full text Add to dashboard Cite

This paper presents an adaptive reinforcement learning- (ARL-) based motion/force tracking control scheme consisting of the optimal motion dynamic control law and force control scheme for multimanipulator systems. Specifically, a new additional term and appropriate state vector are employed in designing the ARL technique for time-varying dynamical systems with online actor/critic algorithm to be established by minimizing the squared Bellman error. Additionally, the force control law is designed after obtaining the computation of constraint force coefficient by the Moore–Penrose pseudo-inverse matrix. The tracking effectiveness of the ARL-based optimal control is verified in the closed-loop system by theoretical analysis. Finally, simulation studies are conducted on a system of three manipulators to validate the physical realization of the proposed optimal tracking control design.

show abstract

“…The same method has also been used to solve OPFB LQT problem [24] by employing VFA technique. In recent studies, the modelfree state reconstruction technique was also used to develop OPFB Q-learning PI scheme for H ∞ control problem [33], [34].…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems

et al. 2020

Self Cite

View full text Add to dashboard Cite

This paper investigates the output feedback (OPFB) tracking control problem for discretetime linear (DTL) systems with unknown dynamics. With the approach of augmented system, the tracking control problem is first turned into a regulation problem with a discounted performance function, the solution of which relies on the Q-function based Bellman equation. Then, a novel value iteration (VI) scheme based on reinforcement Q-learning mechanism is proposed for solving the Q-function Bellman equation without knowing the system dynamics. Moreover, the convergence of the VI based Q-learning is proved by indicating that it converges to the Q-function Bellman equation and it brings out no bias of solution even under the probing noise satisfying the persistent excitation (PE) condition. As a result, the OPFB tracking controller can be learned online by using the past input, output, and reference trajectory data of the augmented system. The proposed scheme removes the requirement of initial admissible policy in the policy iteration (PI) method. Finally, effectiveness of the proposed scheme is demonstrated through a simulation example. INDEX TERMS Adaptive dynamic programming (ADP); optimal conrol; Bellman equation; on-policy; internal model

show abstract

Reinforcement Q-Learning Algorithm for H_∞ Tracking Control of Unknown Discrete-Time Linear Systems

Cited by 47 publications

References 46 publications

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Adaptive Reinforcement Learning-Enhanced Motion/Force Control Strategy for Multirobot Systems

Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems

Contact Info

Product

Resources

About

Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems

Cited by 47 publications

References 46 publications

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning

Adaptive Reinforcement Learning-Enhanced Motion/Force Control Strategy for Multirobot Systems

Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems

Contact Info

Product

Resources

About

Reinforcement Q-Learning Algorithm for H_∞ Tracking Control of Unknown Discrete-Time Linear Systems