Discounted Markov decision processes with utility constraints

Kadota, Yoshinobu; Kurano, Masami; Yasuda, Masami

doi:10.1016/j.camwa.2005.11.013

Cited by 34 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To the best of our knowledge, the fourth group is addressed only in [11] for the average criteria. Concerning group (i), the existence and algorithms of constrained optimal policies are given in [6][7][8][9][10] for variant discounted criteria when states and actions are finite, in [1,25,37] for the discounted criteria and denumerable states, and in [1,2,23,37,38] for the average criteria and denumerable states. Also, the existence of constrained optimal policies and linear programming formulation for group (ii) are given in [19,33] for the discounted criteria and in [20,29,33] for the average criteria.…”

mentioning

confidence: 99%

Discounted continuous-time constrained Markov decision processes in Polish spaces

Guo¹,

Song²

2011

Ann. Appl. Probab.

View full text Add to dashboard Cite

This paper is devoted to studying constrained continuous-time Markov decision processes (MDPs) in the class of randomized policies depending on state histories. The transition rates may be unbounded, the reward and costs are admitted to be unbounded from above and from below, and the state and action spaces are Polish spaces. The optimality criterion to be maximized is the expected discounted rewards, and the constraints can be imposed on the expected discounted costs. First, we give conditions for the nonexplosion of underlying processes and the finiteness of the expected discounted rewards/costs. Second, using a technique of occupation measures, we prove that the constrained optimality of continuous-time MDPs can be transformed to an equivalent (optimality) problem over a class of probability measures. Based on the equivalent problem and a so-calledw-weak convergence of probability measures developed in this paper, we show the existence of a constrained optimal policy. Third, by providing a linear programming formulation of the equivalent problem, we show the solvability of constrained optimal policies. Finally, we use two computable examples to illustrate our main results. . This reprint differs from the original in pagination and typographic detail. 1 2 X. GUO AND X. SONG [13,15,34,36, 42], and (iv) constrained continuous-time MDPs with a Polish state space [11]. A review of these references shows that most of the related literature is concentrated with the first three groups. To the best of our knowledge, the fourth group is addressed only in [11] for the average criteria. Concerning group (i), the existence and algorithms of constrained optimal policies are given in [6][7][8][9][10] for variant discounted criteria when states and actions are finite, in [1,25,37] for the discounted criteria and denumerable states, and in [1,2,23,37,38] for the average criteria and denumerable states. Also, the existence of constrained optimal policies and linear programming formulation for group (ii) are given in [19,33] for the discounted criteria and in [20,29,33] for the average criteria. Although group (iii) has been studied in [13,15,34,36, 42], the references [13,15,34,36, 42] deal with the case of a single constraint, the transition rates in [34] are assumed to be bounded, and the assumption of denumerable states in these references cannot be dropped. On the other hand, as mentioned above, constrained MDPs in Polish spaces are also studied in [19,20,29,33] for the discretetime case and in [11] for the continuous-time case. However, the reward and cost functions in [29] are assumed to be all bounded, and all cost functions in [11,19,20,33] are assumed to be essentially nonnegative. Further, such nonnegativeness assumption cannot be removed because it is required for the use of the standard weak convergence of probability measures. This in turn implies that the constrained optimality problem of minimizing nonnegative costs in [11,19,20] with constraints imposed on other nonnegative costs cannot be transformed to an equ...

show abstract

mentioning

confidence: 99%

Discounted continuous-time constrained Markov decision processes in Polish spaces

Guo¹,

Song²

2011

Ann. Appl. Probab.

View full text Add to dashboard Cite

show abstract

“…O MDP possui restrições que a política deve seguir. Nestes casos, a soma total das recompensas esperadas deve ser maximizada, desde que não viole as restrições do MDP [18,28]. Essas restrições representam alguma ideia de risco, por exemplo, um estado de erro nãoé visitado com probabilidade menor que ↵.…”

Section: Tratamento De Risco Em Mdpunclassified

Processos de decisão de Markov com sensibilidade a risco com função de utilidade exponencial: Uma revisão sistemática da literatura

Freitas

Delgado

Silva

2017

Anais Do Simpósio Brasileiro De Sistemas De Informação (SBSI)

View full text Add to dashboard Cite

Processos de decisão de Markov, sensível a risco, averso a risco, planejamento probabilístico, utilidade exponencial ABSTRACT Markov Decision Process (MDP) has been used very e ciently to solve sequential decision-making problems. There are problems in which dealing with the risks of the environment to obtain a reliable result is more important than maximizing the expected average return. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). This systematic review of the literature aims to identify the theoretical results and proposed algorithms to solve RSMDP problems that have an exponential utility function, evaluating their main characteristics, similarities, particularities and di↵erences in order Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SBSI 2017 June 5 th -8 th , 2017, Lavras, Minas Gerais, Brazil Copyright SBC 2017.to allow the reader the knowledge of this tool of decision making for risk sensitive problems.

show abstract

“…The optimization of effective bandwidth under tight collision constraint can be formulated as constrained risk sensitive Markov decision. The only existing results to our knowledge are [29], [30], where [29] considers the set of Markov policies whereas [30] treats the general constrained Markov decision process with the objective function and the constraints given by general utility functions and establishes the existence of an optimal policy in the set of general history dependent policies without any structural property or computation procedure of the optimal policy. In this paper, we exploit the special form of the utility function and the associated linear constraints, which leads to a structured optimal policy.…”

Section: B Related Workmentioning

confidence: 99%

“…Taking the expectation of Eq. (30) implies that the expected number of total successful transmissions is upper bounded by τ t + c and thus the expected number of total collisions is bounded by (τ t+c)(1−exp(−λT )) exp(−λT )…”

Section: Lemmamentioning

confidence: 99%

Delay optimal multichannel opportunistic access

Chen

Tong

Zhao

2012

2012 Proceedings IEEE INFOCOM

View full text Add to dashboard Cite

The problem of minimizing queueing delay of opportunistic access of multiple continuous time Markov channels is considered. A new access policy based on myopic sensing and adaptive transmission (MS-AT) is proposed. Under the framework of risk sensitive constrained Markov decision process with effective bandwidth as a measure of queueing delay, it is shown that MS-AT achieves simultaneously throughput and delay optimality. It is shown further that both the effective bandwidth and the throughput of MS-AT are two-segment piece-wise linear functions of the collision constraint (maximum allowable conditional collision probability) with the effective bandwidth and throughput coinciding in the regime of tight collision constraints. Analytical and simulations comparisons with the myopic sensing and memoryless transmission (MS-MT) policy which is throughput optimal but delay suboptimal in the regime of tight collision constraints. Index terms-Delay optimal medium access, effective bandwidth, opportunistic access, and constrained risk sensitive Markov decision process. * S. Chen and L. Tong are with the

show abstract

Discounted Markov decision processes with utility constraints

Cited by 34 publications

References 11 publications

Discounted continuous-time constrained Markov decision processes in Polish spaces

Discounted continuous-time constrained Markov decision processes in Polish spaces

Processos de decisão de Markov com sensibilidade a risco com função de utilidade exponencial: Uma revisão sistemática da literatura

Delay optimal multichannel opportunistic access

Contact Info

Product

Resources

About