Alexey Piunovskiy scite author profile

This paper deals with denumerable continuous-time Markov decision processes (MDP) with constraints. The optimality criterion to be minimized is expected discounted loss, while several constraints of the same type are imposed. The transition rates may be unbounded, the loss rates are allowed to be unbounded as well (from above and from below), and the policies may be history-dependent and randomized. Based on Kolmogorov's forward equation and Dynkin's formula, we remind the reader about the Bellman equation, introduce and study occupation measures, reformulate the optimization problem as a (primary) linear program, provide the form of optimal policies for a constrained optimization problem here, and establish the duality between the convex analytic approach and dynamic programming. Finally, a series of examples is given to illustrate all of our results.

show abstract

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Dufour

Horiguchi

Piunovskiy

2012

Adv. Appl. Probab.

View full text Add to dashboard Cite

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alexey Piunovskiy

Stochastic Differential Delay Equations with Markovian Switching

Optimal Control of Random Sequences in Problems with Constraints

Discounted Continuous-Time Markov Decision Processes with Unbounded Rates: The Convex Analytic Approach

Discounted Continuous-Time Markov Decision Processes with Constraints: Unbounded Transition and Loss Rates

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Contact Info

Product

Resources

About