Total Expected Discounted Reward
            <scp>MDPS</scp>
            : Existence of Optimal Policies

Feinberg, Eugene A.

doi:10.1002/9780470400531.eorms0906

Cited by 5 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such arguments are standard in the literature on MDP and infinite-horizon inventory control problems (cf. Iglehart (1963), Sennott (1989), Schäl (1993), Fleischmann and Kuik (2003), Feinberg (2011), Huh, Janakiraman and Nagarajan ( 2011)). We note that the somewhat non-standard aspect here is that the demand in each period is distributed as D − r L , and thus may be negative.…”

Section: Proof Of Theoremmentioning

confidence: 99%

Asymptotic Optimality of Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems

Xin

Goldberg

2018

Management Science

View full text Add to dashboard Cite

Dual-sourcing inventory systems, in which one supplier is faster (i.e. express) and more costly, while the other is slower (i.e. regular) and cheaper, arise naturally in many real-world supply chains. These systems are notoriously difficult to optimize due to the complex structure of the optimal solution and the curse of dimensionality, having resisted solution for over 40 years. Recently, so-called Tailored Base-Surge (TBS) policies have been proposed as a heuristic for the dual-sourcing problem. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the express source follows a simple order-up-to rule. Numerical experiments by several authors have suggested that such policies perform well as the lead time difference between the two sources grows large, which is exactly the setting in which the curse of dimensionality leads to the problem becoming intractable. However, providing a theoretical foundation for this phenomenon has remained a major open problem.In this paper, we provide such a theoretical foundation by proving that a simple TBS policy is indeed asymptotically optimal as the lead time of the regular source grows large, with the lead time of the express source held fixed. Our main proof technique combines novel convexity and lower-bounding arguments, an explicit implementation of the vanishing discount factor approach to analyzing infinite-horizon Markov decision processes, and ideas from the theory of random walks and queues, significantly extending the methodology and applicability of a novel framework for analyzing inventory models with large lead times recently introduced by Goldberg and co-authors in the context of lost-sales models with positive lead times.

show abstract

Section: Proof Of Theoremmentioning

confidence: 99%

Asymptotic Optimality of Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems

Xin

Goldberg

2018

Management Science

View full text Add to dashboard Cite

show abstract

Deep Reinforcement Learning-Based Anti-Jamming Algorithm Using Dual Action Network

Chen

Ling

et al. 2023

IEEE Trans. Wireless Commun.

View full text Add to dashboard Cite

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

Brown

Haugh

2017

Operations Research

View full text Add to dashboard Cite

We consider the information relaxation approach for calculating performance bounds for stochastic dynamic programs (DPs), following Brown et al. [Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, Part 1):785-801]. This approach generates performance bounds by solving problems with relaxed nonanticipativity constraints and a penalty that punishes violations of these constraints. In this paper, we study infinite horizon DPs with discounted costs and consider applying information relaxations to reformulations of the DP. These reformulations use different state transition functions and correct for the change in state transition probabilities by multiplying by likelihood ratio factors. These reformulations can greatly simplify solutions of the information relaxations, both in leading to finite horizon subproblems and by reducing the number of states that need to be considered in these subproblems. We show that any reformulation leads to a lower bound on the optimal cost of the DP when used with an information relaxation and a penalty built from a broad class of approximate value functions. We refer to this class of approximate value functions as subsolutions, and this includes approximate value functions based on Lagrangian relaxations as well as those based on approximate linear programs. We show that the information relaxation approach, in theory, recovers a tight lower bound using any reformulation and is guaranteed to improve on the lower bounds from subsolutions. Finally, we apply information relaxations to an inventory control application with an autoregressive demand process, as well as dynamic service allocation in a multiclass queue. In our examples, we find that the information relaxation lower bounds are easy to calculate and are very close to the expected cost using simple heuristic policies, thereby showing that these heuristic policies are nearly optimal. Keywords: infinite horizon dynamic programs • information relaxations • Lagrangian relaxations • inventory control • multiclass queues

show abstract

Total Expected Discounted Reward MDPS : Existence of Optimal Policies

Cited by 5 publications

References 17 publications

Asymptotic Optimality of Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems

Asymptotic Optimality of Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems

Deep Reinforcement Learning-Based Anti-Jamming Algorithm Using Dual Action Network

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

Contact Info

Product

Resources

About