2015
DOI: 10.1109/tcyb.2014.2322116
|View full text |Cite
|
Sign up to set email alerts
|

Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems

Abstract: This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
37
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 83 publications
(37 citation statements)
references
References 28 publications
0
37
0
Order By: Relevance
“…Thus, Algorithm 2 is completely model-free (data-driven) and robust to the inaccuracy in system modeling. Note that once the rank condition is met, the data of the state and behavior polices can be used repeatedly, which is more computationally efficient than the existing Q-learning methods [41,44,48,49], for which new experience data must be regenerated in each new iteration.…”
Section: Remark 4 Though the Initially Admissible Feedback Gainsmentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, Algorithm 2 is completely model-free (data-driven) and robust to the inaccuracy in system modeling. Note that once the rank condition is met, the data of the state and behavior polices can be used repeatedly, which is more computationally efficient than the existing Q-learning methods [41,44,48,49], for which new experience data must be regenerated in each new iteration.…”
Section: Remark 4 Though the Initially Admissible Feedback Gainsmentioning
confidence: 99%
“…However, there are very few studies on continuous-time systems, because it is difficult to construct the Q-functions for continuous-time systems. In [44], a Qlearning method was developed for infinite-horizon discounted cost linear quadratic regulator problems. The author of [31] proposed a model-free synchronous PI algorithm for linear nonzero-sum quadratic differential games by constructing a Q-function for each player.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…When the state action space is small, the approximate optimal solution can be obtained using any RL method. When the space is large or continuous, RL does not converge to the optimal solution, which is called the “curse” of dimensionality. There are two problems when RL is applied in the robust control: RL overestimates the value functions using the minimum operator in the update rule.…”
Section: Introductionmentioning
confidence: 99%
“…The optimal control problems have been studied for more than half a century, and quite a number of methods have been presented, such as fuzzy logic theories [1][2][3][4] and approximation dynamic programming (ADP) algorithms. [5][6][7][8][9][10][11][12][13][14][15][16][17] With the rapid development of intelligent computation technologies in the last several decades, intelligent control methods have been applied in the analysis of nonlinear systems (see the works of Zhang et al, 18 Zhao et al, 19 and Huang and Chung 20 ). For nonlinear systems, the optimal control is to find the solutions to the Hamilton-Jacobi-Bellman (HJB) equation.…”
Section: Introductionmentioning
confidence: 99%