W e consider a discrete-time infinite-horizon inventory system with non-stationary demand, full backlogging, and deterministic replenishment lead time. Demand arrives according to a probability distribution conditional on the state of the world that undergoes Markovian transitions over time. But the actual state of the world can only be imperfectly estimated based on past demand data. We model the inventory replenishment problem for this system as a Markov decision process (MDP) with an uncountable state space consisting of both the inventory position and the most recent belief, a conditional probability mass function, about the actual state of the world. Assuming that the state of the world evolves as an ergodic Markov chain, using the vanishing discount method along with a coupling argument, we prove the existence of an optimal average cost that is independent of the initial system state. For our linear cost structure, we also establish the average-cost optimality of a belief-dependent base-stock policy. We then discretize the uncountable belief space into a regular grid and observe that the average cost under our discretization converges to the optimal average cost as the number of grid points grows large. Finally, we conduct numerical experiments to evaluate the use of a myopic beliefdependent base-stock policy as a heuristic for our MDP with the uncountable state space. On a test bed of 108 instances, the average cost obtained from the myopic policy deviates by no more than a few percent from the best lower bound on the optimal average cost obtained from our discretization.
In their 2004 seminal paper, Glynn and Juneja formally and precisely established the rate-optimal, probabilityof-incorrect-selection, replication allocation scheme for selecting the best of k simulated systems. In the case of independent, normally distributed outputs this allocation has a simple form that depends in an intuitively appealing way on the true means and variances. Of course the means and (typically) variances are unknown, but the rate-optimal allocation provides a target for implementable, dynamic, data-driven policies to achieve. In this paper we compare the empirical behavior of four related replication-allocation policies: mCEI from Chen and Rzyhov and our new gCEI policy that both converge to the Glynn and Juneja allocation; AOMAP from Peng and Fu that converges to the OCBA optimal allocation; and TTTS from Russo that targets the rate of convergence of the posterior probability of incorrect selection. We find that these policies have distinctly different behavior in some settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.