On the Asymptotic Validity of Fully Sequential Selection Procedures for Steady-State Simulation

Kim, Seong‐Hee; Nelson, Barry L.

doi:10.1287/opre.1060.0281

Cited by 101 publications

(56 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We do not review this literature in detail, but state only that an overview may be found in [1] and that a more recent policy which performs quite well in the multistage setting with normal rewards is given in [23], [22]. Other sequential and staged policies for independent normal rewards with frequentist guarantees include those in [25], [27], [17], [26], and [24].…”

Section: Introductionmentioning

confidence: 99%

A Knowledge-Gradient Policy for Sequential Information Collection

Frazier¹,

Powell²,

Dayanık³

2008

SIAM J. Control Optim.

346

268

View full text Add to dashboard Cite

Abstract. In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value of information in each time period, where the value is measured according to the terminal utility function. We show that the knowledge-gradient policy is optimal both when the horizon is a single time period and in the limit as the horizon extends to infinity. We show furthermore that, in some special cases, the knowledge-gradient policy is optimal regardless of the length of any given fixed total sampling horizon. We bound the knowledge-gradient policy's suboptimality in the remaining cases, and show through simulations that it performs competitively with or significantly better than other policies.

show abstract

Section: Introductionmentioning

confidence: 99%

A Knowledge-Gradient Policy for Sequential Information Collection

Frazier¹,

Powell²,

Dayanık³

2008

SIAM J. Control Optim.

346

268

View full text Add to dashboard Cite

show abstract

“…Kim and Nelson (2006) justify this asymptotically in a diffusion-approximation framework when certain technical conditions, such as those for a functional central limit theorem, are valid. We presume that such technical conditions hold in this subsection.…”

Section: D1 Autocorrelated Outputmentioning

confidence: 98%

Economic Analysis of Simulation Selection Problems

Chick

Gans

2009

Management Science

View full text Add to dashboard Cite

Appendix A provides additional background that describes the multi-armed bandit problem and the relationship of the simulation selection problem to a stoppable version of the multi-armed bandit. It also provides a numerical example that shows that the few existing results that characterize optimal policies for stoppable bandits do not apply to the simulation selection problem.Appendix B motivates the free boundary equation whose solution approximates the optimal expected discounted reward when k = 1. Appendix C provides mathematical proofs of the claims in the main paper.Appendix D describes several technical extensions that expand the range of validity of the paper. It relaxes some assumptions about the independence of the output from a single system, as well as the duration of the replications for each alternative.Appendix E summarizes how the optimal expected discounted reward (OEDR) and stopping boundaries for the simulation selection problem with k = 1 alternative were computed. Appendix F specifies the simulation selection procedures that are used in §6.3. Appendix A: Supplement: Multi-Armed Bandits and the Simulation Selection ProblemThe simulation selection problem is closely related to a class of sequential decision problem known as the multi-armed bandit problem. In this section, we review relevant theory, and we apply the theory to demonstrate that simulation selection problems can be reduced to a variation of multi-armed bandits that is called a stoppable bandit problem. We then present a numerical example that indicates that well-known sufficient conditions, used to justify the optimality of indexed-based rules in stoppable-bandit problems, do not hold in our case. A.1. The Multi-Armed Bandit ProblemThis section supplements the discussion in §3 by providing formal definitions of the multi-armed bandit problem and of optimal allocation index rules.In the discounted multi-armed bandit problem, a decision-maker chooses repeatedly among a finite set of mutuallyindependent Markov chains that are indexed i = 1, 2, . . . , k. A choice of chain i at stage t yields an expected reward that is specific to the state of chain i, and it initiates a state transition for chain i. The k − 1 chains not chosen at stage t remain in their current states and earn no rewards. The objective is to maximize the expected sum of discounted rewards over an infinite horizon (Gittins 1989).For the case in which expected one-period rewards are bounded for each chain, Gittins and co-workers proved that an index can be computed for each arm, independently of all other arms, such that it is optimal to select the arm whose index is greatest among all arms. This allocation index has come be known as a "Gittins index."Formally, we define the multi-armed bandit's parameters as follows. Markov chain i has state space Ω Θ i , with states Θ i ∈ Ω Θ i . The state space has σ-algebra, F i , of subsets of Ω Θ i , which includes all elements Θ i ∈ Ω Θ i . We define the product space of joint outcomes across all k Markov chains as (Ω, F). If cha...

show abstract

“…This assumption often holds in many applications where the simulation estimate itself is the average of a large number of observations generated during the simulation process. There are also procedures that are also asymptotically valid when simulation samples are non-normal (Kim and Nelson, 2006). Subset selection procedures relax the objective of selecting the best to selecting a subset that contains the best solution .…”

Section: Ranking and Selectionmentioning

confidence: 99%

Simulation Optimization: A Review and Exploration in the New Era of Cloud Computing and Big Data

Huang

Chen

et al. 2015

Asia Pac. J. Oper. Res.

163

View full text Add to dashboard Cite

Recent advances in simulation optimization research and explosive growth in computing power have made it possible to optimize complex stochastic systems that are otherwise intractable. In the first part of this paper, we classify simulation optimization techniques into four categories based on how the search is conducted. We provide tutorial expositions on representative methods from each category, with a focus in recent developments, and compare the strengths and limitations of each category. In the second part of this paper, we review applications of simulation optimization in various contexts, with detailed discussions on health care, logistics, and manufacturing systems. Finally, we explore the potential of simulation optimization in the new era. Specifically, we discuss * Corresponding author. how simulation optimization can benefit from cloud computing and high-performance computing, its integration with big data analytics, and the value of simulation optimization to help address challenges in engineering design of complex systems.

show abstract

On the Asymptotic Validity of Fully Sequential Selection Procedures for Steady-State Simulation

Cited by 101 publications

References 23 publications

A Knowledge-Gradient Policy for Sequential Information Collection

A Knowledge-Gradient Policy for Sequential Information Collection

Economic Analysis of Simulation Selection Problems

Simulation Optimization: A Review and Exploration in the New Era of Cloud Computing and Big Data

Contact Info

Product

Resources

About