Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Feinberg, Eugene A.; Kasyanov, Pavlo O.; Zadoianchuk, Nina V.

doi:10.1287/moor.1120.0555

Cited by 103 publications

(183 citation statements)

References 38 publications

Supporting

Mentioning

183

Contrasting

Order By: Relevance

“…Thus for an MDP an initial state x is considered instead of the initial distribution p. In fact, this MDP possesses a special property that action sets at all the states are equal. For MDPs, Feinberg et al [14] provides general conditions for the existence of optimal policies, validity of optimality equations, and convergence of value iterations. Here we formulate these conditions for an MDP whose action sets in all states are equal.…”

Section: R a F Tmentioning

confidence: 99%

See 1 more Smart Citation

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Feinberg

Kasyanov²,

Zgurovsky³

2016

Mathematics of OR

Self Cite

102

View full text Add to dashboard Cite

This paper describes sufficient conditions for the existence of optimal policies for Partially Observable Markov Decision Processes (POMDPs) with Borel state, observation, and action sets and with the expected total costs. Action sets may not be compact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semi-continuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to Completely Observable Markov Decision Processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above mentioned optimality properties for COMDPs. The central question is whether transition probabilities for a COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov Decision Process are weakly continuous and observation probabilities for the POMDP are continuous in the total variation. Moreover, the continuity in the total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.

show abstract

Section: R a F Tmentioning

confidence: 99%

“…According to Feinberg et al [14,Corollary 3.2], the real-valued function ψ(x) = inf a∈A c(x, a), x ∈ X <+∞ , with values in R, is inf-compact on X <+∞ . Furthermore, (6.2) implies that X <+∞ ψ(x)z (n) (dx) ≤ λ, n = 1, 2, .…”

Section: The Inequalitiesmentioning

confidence: 99%

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Feinberg

Kasyanov²,

Zgurovsky³

2016

Mathematics of OR

Self Cite

102

View full text Add to dashboard Cite

show abstract

“…It is well known that the set of deterministic stationary policies contains optimal policies for a large class of infinite horizon discounted cost problems (see, e.g., [6], [14]) and average cost optimal control problems (see, e.g., [1], [14]). …”

Section: B(a)mentioning

confidence: 99%

Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Stochastic Control

Saldi¹,

Linder²,

Yüksel³

2015

IEEE Trans. Automat. Contr.

View full text Add to dashboard Cite

Abstract-We consider the discrete approximation of stationary policies for a discrete-time Markov decision process with Polish state and action spaces under total, discounted, and average cost criteria. Deterministic stationary quantizer policies are introduced and shown to be able to approximate optimal deterministic stationary policies with arbitrary precision under mild technical conditions, thus demonstrating that one can search for ε-optimal policies within the class of quantized control policies. We also derive explicit bounds on the approximation error in terms of the quantization rate.

show abstract

“…It is well known that the set of deterministic stationary policies contains an optimal policy for a large class of infinite horizon discounted cost problems (see, e.g., [4], [7]) and average cost optimal control problems (see, e.g., [4]). …”

Section: Markov Decision Processesmentioning

confidence: 99%

Finite state approximations of Markov decision processes with general state and action spaces

Saldi

Linder

Yüksel

2015

2015 American Control Conference (ACC)

View full text Add to dashboard Cite

The purpose of this paper is to prove existence of an ε-equilibrium point in a dynamic Nash game with Borel state space and long-run time average cost criteria for the players. The idea of the proof is first to convert the initial game with ergodic costs to an "equivalent" game endowed with discounted costs for some appropriately chosen value of the discount factor, and then to approximate the discounted Nash game obtained in the first step with a countable state space game for which existence of a Nash equilibrium can be established. From the results of Whitt we know that if for any ε > 0 the approximation scheme is selected in an appropriate way, then Nash equilibrium strategies for the approximating game are also ε-equilibrium strategies for the discounted game constructed in the first step. It is then shown that these strategies constitute an ε-equilibrium point for the initial game with ergodic costs as well. The idea of canonical triples, introduced by Dynkin and Yushkevich in the control setting, is adapted here to the game situation. 1. Introduction. We are considering a two-person Markov game over an infinite time horizon. The state space E of the process {x t } ∞ t=0 controlled by the players is taken to be a Borel space E equipped with the Borel σ-algebra E. The action spaces U 1 and U 2 of player 1 and 2, respectively, are compact subsets of some metric spaces. Let U i denote the Borel σ-algebra on U i , and let P(U i) denote the set of all probability measures on (U i , U i), 1991 Mathematics Subject Classification: Primary 90D10, 90D20; Secondary 90D05, 93E05.

show abstract

Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Cited by 103 publications

References 38 publications

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Stochastic Control

Finite state approximations of Markov decision processes with general state and action spaces

Contact Info

Product

Resources

About