2006
DOI: 10.1016/j.camwa.2005.11.013
|View full text |Cite
|
Sign up to set email alerts
|

Discounted Markov decision processes with utility constraints

Abstract: We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
2

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(16 citation statements)
references
References 11 publications
0
12
0
2
Order By: Relevance
“…To the best of our knowledge, the fourth group is addressed only in [11] for the average criteria. Concerning group (i), the existence and algorithms of constrained optimal policies are given in [6][7][8][9][10] for variant discounted criteria when states and actions are finite, in [1,25,37] for the discounted criteria and denumerable states, and in [1,2,23,37,38] for the average criteria and denumerable states. Also, the existence of constrained optimal policies and linear programming formulation for group (ii) are given in [19,33] for the discounted criteria and in [20,29,33] for the average criteria.…”
mentioning
confidence: 99%
“…To the best of our knowledge, the fourth group is addressed only in [11] for the average criteria. Concerning group (i), the existence and algorithms of constrained optimal policies are given in [6][7][8][9][10] for variant discounted criteria when states and actions are finite, in [1,25,37] for the discounted criteria and denumerable states, and in [1,2,23,37,38] for the average criteria and denumerable states. Also, the existence of constrained optimal policies and linear programming formulation for group (ii) are given in [19,33] for the discounted criteria and in [20,29,33] for the average criteria.…”
mentioning
confidence: 99%
“…O MDP possui restrições que a política deve seguir. Nestes casos, a soma total das recompensas esperadas deve ser maximizada, desde que não viole as restrições do MDP [18,28]. Essas restrições representam alguma ideia de risco, por exemplo, um estado de erro nãoé visitado com probabilidade menor que ↵.…”
Section: Tratamento De Risco Em Mdpunclassified
“…The optimization of effective bandwidth under tight collision constraint can be formulated as constrained risk sensitive Markov decision. The only existing results to our knowledge are [29], [30], where [29] considers the set of Markov policies whereas [30] treats the general constrained Markov decision process with the objective function and the constraints given by general utility functions and establishes the existence of an optimal policy in the set of general history dependent policies without any structural property or computation procedure of the optimal policy. In this paper, we exploit the special form of the utility function and the associated linear constraints, which leads to a structured optimal policy.…”
Section: B Related Workmentioning
confidence: 99%
“…Taking the expectation of Eq. (30) implies that the expected number of total successful transmissions is upper bounded by τ t + c and thus the expected number of total collisions is bounded by (τ t+c)(1−exp(−λT )) exp(−λT )…”
Section: Lemmamentioning
confidence: 99%