2012
DOI: 10.1007/s10957-012-0015-8
|View full text |Cite
|
Sign up to set email alerts
|

The Transformation Method for Continuous-Time Markov Decision Processes

Abstract: In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discretetime Markov decision process. Based on a completely new proof, which does not involve Kolmogorov's forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and suffi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
8

Relationship

4
4

Authors

Journals

citations
Cited by 17 publications
(27 citation statements)
references
References 28 publications
(66 reference statements)
0
27
0
Order By: Relevance
“…For the first part of this lemma, it remains to recognize that the two equations (7) and (21) admit the same minimal nonnegative solution. Below, in spite that the argument is trivial, we briefly verify this relation because first, a similar relation between equation (7) and another equation similar to (21) was falsely claimed without proofs in [31], see equation (8) therein, and second, it is easy to construct examples to show that equations (7) and (21) are not equivalent; indeed, there can be solutions to (7), which do not satisfy (21). For brevity, we write (7) as…”
Section: Resultsmentioning
confidence: 82%
See 1 more Smart Citation
“…For the first part of this lemma, it remains to recognize that the two equations (7) and (21) admit the same minimal nonnegative solution. Below, in spite that the argument is trivial, we briefly verify this relation because first, a similar relation between equation (7) and another equation similar to (21) was falsely claimed without proofs in [31], see equation (8) therein, and second, it is easy to construct examples to show that equations (7) and (21) are not equivalent; indeed, there can be solutions to (7), which do not satisfy (21). For brevity, we write (7) as…”
Section: Resultsmentioning
confidence: 82%
“…The existence of a deterministic stationary optimal policy is proved under a different and general set of conditions as compared to the previous literature; the controlled process can be explosive, the transition rates can be arbitrarily unbounded and are weakly continuous, the multifunction defining the admissible action spaces can be neither compact-valued nor upper semi-continuous, and the cost rate is not necessarily inf-compact.Firstly, all the aforementioned works on CTMDPs [13,14,15,16,19,27,32,34] assume the underlying process to be non-explosive; and most of them achieve this by assuming the existence of a Lyapunov function bounding the growth of the transition rates. In the present article we remove this condition, and allow the transition rates to be essentially arbitrarily unbounded, and the controlled process to be possibly explosive.The development of the theory covering such CTMDPs was once regarded quite challenging in the survey [15];for the discounted criteria it has been done in e.g., [7], see also [31].Secondly, we assume the weak continuity on the underlying signed kernel defining the transition rates, while all the previous literature on average CTMDPs in Borel spaces is based on the strong continuity condition, except for [20], which establishes the existence of a randomized stationary optimal policy for the constrained CTMDPs.It is relevant to point out that recently the developments of the theory of average DTMDPs (discrete-time Markov decision processes) and SMDPs (semi-Markov decision processes) with weakly continuous (also called Feller) transition probabilities have received much attention from the research community [5,6,8,24,25,26].In a nutshell, as compared to the strongly continuous case, the proofs with weakly continuous transition rates are more technical, and the construction of the solution to the optimality inequality would involve the notion of the generalized lower limit and the generalized Fatou's lemma. Moreover, based on a neat generalization of the Berge theorem [9], which is partially summarized in Lemma 5.1 below, and as in [8] for the average DTMDP, we allow the multifunction defining the admissible action spaces to be neither compact-valued nor upper semi-continuous.If the state space is countable, then the concepts of weak and strong continuity coincide.…”
mentioning
confidence: 99%
“…For the first part of this lemma, it remains to recognize that the two equations (7) and (21) admit the same minimal nonnegative solution. Below, in spite of the argument being trivial, we briefly verify this relation because first, a similar relation between (7) and another equation similar to (21) was falsely claimed without proofs in [31] (see [31,Equation (8)]), and second, it is easy to construct examples to show that equations (7) and (21) are not equivalent; indeed, there can be solutions to (7), which do not satisfy (21). For brevity, we write (7) as (dy | x, a))/w(x)) + 1{x ∈ dy})}.…”
Section: Thus It Follows Thatmentioning
confidence: 78%
“…Let x * be the optimal basic solution of (34). According to general results on SMDPs in Denardo [7, Section III], for each z ∈ Z ′ , there exists at most one a ∈ {0, 1} such that x * z,a > 0.…”
Section: Computation Of An Average-optimal Policymentioning
confidence: 99%