Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes

Cheevaprawatdomrong, Torpong; Schochetman, Irwin E.; Smith, Robert L.; García, Alfredo

doi:10.1287/moor.1060.0224

Cited by 19 publications

(16 citation statements)

References 23 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This theorem only establishes the existence of iterations K n with the stated property -we cannot tell whether we have reached K n since it is not possible in general to finitely establish optimality of early decisions in nonstationary MDPs [13].…”

Section: Setmentioning

confidence: 98%

“…For each (n, s, a), let J n (s, a) be the set of nodes in stage n + 1 that are reachable on choosing action a in node s in stage n. That is, J n (s, a) = {s ∈ S : p n (s |s, a) > 0}. (13) Then, the hyperarc corresponding to action a ∈ A that emanates from the node representing state s ∈ S in stage n ∈ N has |J n (s, a)| "heads". Furthermore, the flow reaching from node s to node s ∈ J n (s, a) equals p n (s |s, a)x n (s, a).…”

Section: A Cilp Formulation Of Nonstationary Mdpsmentioning

confidence: 99%

“…This may, at first sight, appear to be a weakness of our algorithm. However, it is in fact due to a feature of problem (D) itself, and more generally, of nonstationary infinite-horizon optimization problems -optimality of a given solution cannot be affirmed with finite computations (see [13]). Fortunately, this does not undermine the validity of Theorem 5.3 as its conclusions are trivially true if x k is optimal for some k and we simply repeat this solution for all subsequent k. The next lemma establishes that the Simplex algorithm produces an improving sequence of basic feasible solutions.…”

Section: Setmentioning

confidence: 99%

“…Nonstationary infinite-horizon Markov decision processes (MDPs) [13] (henceforth called nonstationary MDPs) are one of the most general sequential decision models studied in operations research. Nonstationary MDPs extend the more well-studied stationary MDPs [38,42] by relaxing the restrictive assumption that problem data do not change over time.…”

Section: Introductionmentioning

confidence: 99%

“…Nonstationary MDPs extend the more well-studied stationary MDPs [38,42] by relaxing the restrictive assumption that problem data do not change over time. From a practical viewpoint, nonstationary MDPs incorporate temporal changes in underlying economic and technological conditions into the decision-making process, and have been used to model problems such as asset selling [13] and stochastic inventory control [14]. They can be described as follows.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Ghate

Smith

2013

Operations Research

Self Cite

View full text Add to dashboard Cite

Nonstationary infinite-horizon Markov decision processes (MDPs) generalize the most wellstudied class of sequential decision models in operations research, namely, that of stationary MDPs, by relaxing the restrictive assumption that problem data do not change over time. Linear programming (LP) has been very successful in obtaining structural insights and devising solution methods for stationary MDPs. However, an LP approach for nonstationary MDPs is currently missing. This is because the LP formulation of a nonstationary infinite-horizon MDP includes countably infinite variables and constraints, and research on such infinite-dimensional LPs has traditionally faced several hurdles. For instance, duality results may not hold; an extreme point may not be a basic feasible solution; and in the context of a Simplex algorithm, a pivot operation may require infinite data and computations, and a sequence of improving extreme points need not converge in value to optimal. In this paper, we tackle these challenges and establish (1) weak and strong duality, (2) complementary slackness, (3) a basic feasible solution characterization of extreme points, (4) a one-to-one correspondence between extreme points and deterministic Markovian policies, and (5) devise a Simplex algorithm for an infinite-dimensional LP formulation of nonstationary infinite-horizon MDPs. Pivots in this Simplex algorithm use finite data, perform finite computations, and generate a sequence of improving extreme points that converges in value to optimal. Moreover, this sequence of extreme points gets arbitrarily close to the set of optimal extreme points. We also prove that decisions prescribed by these extreme points are eventually exactly optimal in all states of the nonstationary infinite-horizon MDP in early periods.

show abstract

Section: Setmentioning

confidence: 98%

Section: A Cilp Formulation Of Nonstationary Mdpsmentioning

confidence: 99%

Section: Setmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Ghate

Smith

2013

Operations Research

Self Cite

View full text Add to dashboard Cite

show abstract

Infinite Horizon Problems

Ghate

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

The systems under consideration in (discrete‐time) sequential decision problems in operations research and the management sciences often do not have a predetermined time of extinction. Incorporating an arbitrary finite horizon can therefore introduce end‐of‐study distortions in early decisions. Such problems are therefore typically modeled over an unbounded horizon. A majority of the work in this area focuses on stationary models, which assume that the problem data do not change over time. There is also a considerable body of research on nonstationary problems. We briefly review some of the key concepts in nonstationary infinite horizon sequential decision making problems.

show abstract

Structured Optimal Policies forMarkov Decision Processes: Lattice Programming Techniques

Dragut

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

The structure (monotonicity, convexity, multimodularity, directional convexity and K‐convexity) of optimal policies is used to reduce the exponential growth with respect to the size of a sequential decision process and to compare different control systems. Monotonicity can be derived using Tattice programming techniques, sample path analysis or discrete events dynamic programming. We cover the discrete and continuous‐time MDP and POMDP models with Borel, countable‐and finite‐state and action spaces, as well nonstationary ones. Applications include admission/control of (networks of) queues, maintenance, inventory, production and innovation.

show abstract

Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes

Cited by 19 publications

References 23 publications

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Infinite Horizon Problems

Structured Optimal Policies forMarkov Decision Processes: Lattice Programming Techniques

Contact Info

Product

Resources

About