We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.
This paper is concerned with a general utility of the optimal stopping problem for denumerable Markov chains. The validity of the one-step look ahead (OLA) stopping time is shown under a general utility criteria. It is developed from the view points of the optimality and a "riskaverse" or "riskseeking" characterization. The results are applied to the case of a exponential utility function and illustrated by a simple example.
This paper is concerned with the averagevariance of Markov decision processes with countable states and finite actions. Sufficient conditions will be given to assure that there is a stationary deterministic policy which minimizes the averagevariance in a class of the meanoptimal policies. The class of the policies is detetermined by the quantity of the actions which do not satisfy the meanoptimal equation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.