Q-learning and Pontryagin's Minimum Principle

Mehta, Prashant G.; Meyn, Sean

doi:10.1109/cdc.2009.5399753

Cited by 89 publications

(86 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The answers to this question have been examined in detail in [5] in a deterministic setting. The authors show that the Hamiltonian appearing in nonlinear control theory is essentially the same as the Q-function that is the object of interest in Q-learning.…”

Section: B Mean Field H-learningmentioning

confidence: 99%

Mean field stochastic games: Convergence, Q/H-learning and optimality

Tembiné

2011

Proceedings of the 2011 American Control Conference

View full text Add to dashboard Cite

We consider a class of stochastic games with finite number of resource states, individual states and actions per states. At each stage, a random set of players interact. The states and the actions of all the interacting players determine together the instantaneous payoffs and the transitions to the next states. We study the convergence of the stochastic game with variable set of interacting players when the total number of possible players grow without bound. We provide sufficient conditions for mean field convergence. We characterize the mean field payoff optimality by solutions of a coupled system of backwardforward equations. The limiting games are equivalent to discrete time anonymous sequential population games or to differential population games. Using multidimensional diffusion processes, a general mean field convergence to coupled stochastic differential equation is given. Finally, the computation of mean field equilibria is addressed using Q/H learning.

show abstract

Section: B Mean Field H-learningmentioning

confidence: 99%

Mean field stochastic games: Convergence, Q/H-learning and optimality

Tembiné

2011

Proceedings of the 2011 American Control Conference

View full text Add to dashboard Cite

show abstract

“…Value function approximation is a well known approach to computing suboptimal policies for complex dynamic problems [16], [4], [13]. The control theory community is also increasingly embracing techniques related to approximate dynamic programming for control of complex systems, as evidenced by a number of recent papers on the subject (for example [20], [10], [11], [5], to name a few). Ideas similar to those presented in this paper have also been developed in the recent literature on model-predictive control [3], [14], [2].…”

Section: Introductionmentioning

confidence: 99%

Computing policies and performance bounds for deterministic dynamic programs using mixed integer programming

Cogill

Hindi

2011

Proceedings of the 2011 American Control Conference

View full text Add to dashboard Cite

In this paper we present a mixed integer programming approach to deterministic dynamic programming. We consider the problem of computing a policy that maximizes the total discounted reward earned over an infinite time horizon. While problems of this form are difficult in general, suboptimal solutions and performance bounds can be computed by approximating the dynamic programming value function. Here we provide a linear programming-based method for approximating the value function, and show how suboptimal policies can be computed through repeated solution of mixed integer programs that directly utilize this approximation. We have applied this approach to problems with states described by binary vectors with dimension as large as several hundred. Although the number of distinct states associated with such a problem is extremely large, we are able to obtain suboptimal policies with surprisingly tight performance guarantees. We illustrate the application of this method on a class of infinite horizon job shop scheduling problems.

show abstract

“…In stochastic systems, this is achieved using a randomized stationary policy (cf. [13], [20], [23]), whereas in deterministic systems, a probing noise is added to the derived control law (cf. [1]- [3], [7], [24]).…”

Section: B Learning Based On Desired Behaviormentioning

confidence: 99%

“…In control theory, the desirable behavior is typically quantified using a cost function, and the control problem is formulated as the desire to find the optimal policy that minimizes the cumulative cost. Recently, various RL-based techniques have been developed to approximately solve optimal control problems for continuous-time and discrete-time deterministic systems [1]- [13]. The approximate solution is facilitated via value function approximation, where the value function is approximated using a linear-in-the-parameters (LP) approximation, and the optimal policy is computed based on the estimated value function.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Model-based reinforcement learning for approximate optimal regulation

2016

View full text Add to dashboard Cite

Abstract-In deterministic systems, reinforcement learningbased online approximate optimal control methods typically require a restrictive persistence of excitation (PE) condition for convergence. This paper presents a concurrent learningbased solution to the online approximate optimal regulation problem that eliminates the need for PE. The development is based on the observation that given a model of the system, the Bellman error, which quantifies the deviation of the system Hamiltonian from the optimal Hamiltonian, can be evaluated at any point in the state space. Further, a concurrent learning-based parameter identifier is developed to compensate for parametric uncertainty in the plant dynamics. Uniformly ultimately bounded (UUB) convergence of the system states to the origin, and UUB convergence of the developed policy to the optimal policy are established using a Lyapunov-based analysis.

show abstract

Q-learning and Pontryagin's Minimum Principle

Cited by 89 publications

References 15 publications

Mean field stochastic games: Convergence, Q/H-learning and optimality

Mean field stochastic games: Convergence, Q/H-learning and optimality

Computing policies and performance bounds for deterministic dynamic programs using mixed integer programming

Model-based reinforcement learning for approximate optimal regulation

Contact Info

Product

Resources

About