Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly With 2009 28th Chinese Control Conference 2009
DOI: 10.1109/cdc.2009.5399753
|View full text |Cite
|
Sign up to set email alerts
|

Q-learning and Pontryagin's Minimum Principle

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
86
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 89 publications
(86 citation statements)
references
References 15 publications
0
86
0
Order By: Relevance
“…The answers to this question have been examined in detail in [5] in a deterministic setting. The authors show that the Hamiltonian appearing in nonlinear control theory is essentially the same as the Q-function that is the object of interest in Q-learning.…”
Section: B Mean Field H-learningmentioning
confidence: 99%
“…The answers to this question have been examined in detail in [5] in a deterministic setting. The authors show that the Hamiltonian appearing in nonlinear control theory is essentially the same as the Q-function that is the object of interest in Q-learning.…”
Section: B Mean Field H-learningmentioning
confidence: 99%
“…Value function approximation is a well known approach to computing suboptimal policies for complex dynamic problems [16], [4], [13]. The control theory community is also increasingly embracing techniques related to approximate dynamic programming for control of complex systems, as evidenced by a number of recent papers on the subject (for example [20], [10], [11], [5], to name a few). Ideas similar to those presented in this paper have also been developed in the recent literature on model-predictive control [3], [14], [2].…”
Section: Introductionmentioning
confidence: 99%
“…In stochastic systems, this is achieved using a randomized stationary policy (cf. [13], [20], [23]), whereas in deterministic systems, a probing noise is added to the derived control law (cf. [1]- [3], [7], [24]).…”
Section: B Learning Based On Desired Behaviormentioning
confidence: 99%
“…In control theory, the desirable behavior is typically quantified using a cost function, and the control problem is formulated as the desire to find the optimal policy that minimizes the cumulative cost. Recently, various RL-based techniques have been developed to approximately solve optimal control problems for continuous-time and discrete-time deterministic systems [1]- [13]. The approximate solution is facilitated via value function approximation, where the value function is approximated using a linear-in-the-parameters (LP) approximation, and the optimal policy is computed based on the estimated value function.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation