Constrained markov decision processes with compact state and action spaces: the average case

Kurano, Masami; Nakagami, Jun-ichi; Huang, Youqiang

doi:10.1080/02331930008844505

Cited by 15 publications

(24 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, such nonnegativeness assumption cannot be removed because it is required for the use of the standard weak convergence of probability measures. This in turn implies that the constrained optimality problem of minimizing nonnegative costs in [11,19,20] with constraints imposed on other nonnegative costs cannot be transformed to an equivalent optimality problem of maximizing bounded rewards as in [29] with constraints imposed on bounded costs. Hence, the constrained discrete and continuous time MDPs with Polish spaces, in which rewards (to be maximized) and costs (with constraints) may be unbounded from above and from below, have not been studied.On the other hand, as is known, continuous-time MDPs in Polish spaces have been studied in [11,12,16,27,34].…”

mentioning

confidence: 99%

Discounted continuous-time constrained Markov decision processes in Polish spaces

Guo¹,

Song²

2011

Ann. Appl. Probab.

View full text Add to dashboard Cite

This paper is devoted to studying constrained continuous-time Markov decision processes (MDPs) in the class of randomized policies depending on state histories. The transition rates may be unbounded, the reward and costs are admitted to be unbounded from above and from below, and the state and action spaces are Polish spaces. The optimality criterion to be maximized is the expected discounted rewards, and the constraints can be imposed on the expected discounted costs. First, we give conditions for the nonexplosion of underlying processes and the finiteness of the expected discounted rewards/costs. Second, using a technique of occupation measures, we prove that the constrained optimality of continuous-time MDPs can be transformed to an equivalent (optimality) problem over a class of probability measures. Based on the equivalent problem and a so-calledw-weak convergence of probability measures developed in this paper, we show the existence of a constrained optimal policy. Third, by providing a linear programming formulation of the equivalent problem, we show the solvability of constrained optimal policies. Finally, we use two computable examples to illustrate our main results. . This reprint differs from the original in pagination and typographic detail. 1 2 X. GUO AND X. SONG [13,15,34,36, 42], and (iv) constrained continuous-time MDPs with a Polish state space [11]. A review of these references shows that most of the related literature is concentrated with the first three groups. To the best of our knowledge, the fourth group is addressed only in [11] for the average criteria. Concerning group (i), the existence and algorithms of constrained optimal policies are given in [6][7][8][9][10] for variant discounted criteria when states and actions are finite, in [1,25,37] for the discounted criteria and denumerable states, and in [1,2,23,37,38] for the average criteria and denumerable states. Also, the existence of constrained optimal policies and linear programming formulation for group (ii) are given in [19,33] for the discounted criteria and in [20,29,33] for the average criteria. Although group (iii) has been studied in [13,15,34,36, 42], the references [13,15,34,36, 42] deal with the case of a single constraint, the transition rates in [34] are assumed to be bounded, and the assumption of denumerable states in these references cannot be dropped. On the other hand, as mentioned above, constrained MDPs in Polish spaces are also studied in [19,20,29,33] for the discretetime case and in [11] for the continuous-time case. However, the reward and cost functions in [29] are assumed to be all bounded, and all cost functions in [11,19,20,33] are assumed to be essentially nonnegative. Further, such nonnegativeness assumption cannot be removed because it is required for the use of the standard weak convergence of probability measures. This in turn implies that the constrained optimality problem of minimizing nonnegative costs in [11,19,20] with constraints imposed on other nonnegative costs cannot be transformed to an equ...

show abstract

mentioning

confidence: 99%

Discounted continuous-time constrained Markov decision processes in Polish spaces

Guo¹,

Song²

2011

Ann. Appl. Probab.

View full text Add to dashboard Cite

show abstract

“…Additional results and generalizations of some of the results of [20] were given by Hernández-Lerma and González-Hernández [17]. Extensions of the LP framework to constrained MDPs were subsequently studied by Kurano et al [31] for compact spaces and by Hernández-Lerma et al [19] for non-compact spaces.…”

Section: Introductionmentioning

confidence: 94%

“…Our results for unconstrained (resp., constrained) MDPs given in this paper can be compared with some of the prior results in [22,Chap. 12] and [23] (resp., [19] and [31]) for lower-semicontinuous models.…”

Section: Introductionmentioning

confidence: 99%

On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs

Yu¹

2020

SIAM J. Control Optim.

View full text Add to dashboard Cite

We consider the linear programming approach for constrained and unconstrained Markov decision processes (MDPs) under the long-run average cost criterion, where the class of MDPs in our study have Borel state spaces and discrete countable action spaces. Under a strict unboundedness condition on the one-stage costs and a recently introduced majorization condition on the state transition stochastic kernel, we study infinite-dimensional linear programs for the average-cost MDPs and prove the absence of duality gap and other optimality results. Our results do not require a lower-semicontinuous MDP model and as such, they can be applied to countable action space MDPs where the dynamics and one-stage costs are discontinuous in the state variable. The proofs of these results make use of the continuity property of Borel measurable functions asserted by Lusin's theorem.

show abstract

“…Recently, Kurano et al [3] derived a saddle-point theorem for constrained MDPs with average reward criteria. For the utility treatment for MDPs and constrained MDPs, refer to [1,2,[4][5][6][7] and their references.…”

Section: Introduction and Problem Formulationmentioning

confidence: 99%

Discounted Markov decision processes with utility constraints

Kadota

Kurano

Yasuda

2006

Computers & Mathematics with Applications

Self Cite

View full text Add to dashboard Cite

We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.

show abstract

Constrained markov decision processes with compact state and action spaces: the average case

Cited by 15 publications

References 11 publications

Discounted continuous-time constrained Markov decision processes in Polish spaces

Discounted continuous-time constrained Markov decision processes in Polish spaces

On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs

Discounted Markov decision processes with utility constraints

Contact Info

Product

Resources

About