Spinning plates and squad systems: policies for bi-directional restless bandits

Glazebrook, K. D.; Kirkbride, C.; Ruiz-Hernández, Diego

doi:10.1017/s0001867800000823

Cited by 8 publications

(21 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further classes of indexable problems are the dual speed problem of Glazebrook, Nino-Mora, and Ansell [24], the maintenance models of Glazebrook, Ruiz-Hernandez, and Kirkbride [25], and the spinning plates and squad models of Glazebrook, Kirkbride, and Ruiz-Hernandez [23]. Our paper is in line with these works in that it trades indexability for specific structural conditions.…”

Section: Indexabilitysupporting

confidence: 62%

“…Many extensions and variations of classical bandit problems have been proposed, including: bandits with a varying finite or infinite numbers of arms (Whittle [78] and Banks and Sundaram [3]), bandits where an adversary has control over the payoffs (Auer et al [2]), bandits with dependent arms (Pandey, Chakrabarti, and Agarwal [57]), bandits where multiple arms can be chosen at the same time (Whittle [79]), bandits whose arms yield rewards even when they are inactive (Glazebrook, Kirkbride, and Ruiz-Hernandez [23]), and bandits with switching costs (Banks and Sundaram [4]). …”

Section: Bandit Modelsmentioning

confidence: 99%

“…As H evolves deterministically given U , it is possible to write U = F (R) for a piecewise constant process F on the path space D R [0, ∞). 23 The inductive hypothesis allows one to assume that for i ≥ 1, U i never switches from the safe to the risky arm. If U 0 indicates the risky arm, the proof is complete.…”

Section: Proof Recall Frommentioning

confidence: 99%

“…The first result in this paper is a separation theorem (Theorem 1) that establishes the equivalence of the control problem with partial observations to a control problem with full observations 3 Bandits where the active and passive action have opposite effects on payoffs are called bi-directional bandits (Glazebrook, Kirkbride, and Ruiz-Hernandez [23]), and our model falls into this class. 4 Numerical solutions can be obtained by (possibly approximate) dynamic programming or a linear programming reformulation of the problem (Kushner and Dupuis [42], Powell [61], and Nino-Mora [51]).…”

Section: Introductionmentioning

confidence: 98%

See 3 more Smart Citations

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Fryer

Harms

2013

View full text Add to dashboard Cite

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in applications such as human capital development, job search, and occupational choice. Using new insights from stochastic control, along with a monotonicity condition on the payoff dynamics, we show that optimal strategies in our model are stopping rules that can be characterized by an index which formally coincides with Gittins' index. Our result implies the indexability of a new class of restless bandit models.

show abstract

Section: Indexabilitysupporting

confidence: 62%

Section: Bandit Modelsmentioning

confidence: 99%

Section: Proof Recall Frommentioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

See 2 more Smart Citations

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Fryer

Harms

2013

View full text Add to dashboard Cite

show abstract

“…Further, a range of empirical studies have demonstrated the power and practicability of Whittle's approach in a range of application contexts. See, for example, Ansell et al [2], Opp et al [23], Glazebrook et al [14,15], and Glazebrook and Kirkbride [12,13].…”

mentioning

confidence: 99%

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

Glazebrook

Minty

2009

Mathematics of OR

Self Cite

View full text Add to dashboard Cite

We generalise classical multiarmed bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource, which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch, provided they do not consume more resource than is available. We propose suitable bandit indices that reduce to those proposed by Gittins [Gittins, J. C. 1979. Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B41 148-177] for the classical models. The index that emerges is an elegant generalization of the Gittins index, which measures in a natural way the reward earnable from a bandit per unit of resource consumed. The paper discusses both how such indices may be computed and how they may be used to construct heuristics for resource distribution. We also describe how to develop bounds on the closeness to optimality of index heuristics and demonstrate a form of asymptotic optimality for a greedy index heuristic in a class of simple models. A numerical study testifies to the strong performance of a weighted index heuristic.

show abstract

Monotone Policies and Indexability for Bidirectional Restless Bandits

2013

Self Cite

View full text Add to dashboard Cite

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

show abstract

Spinning plates and squad systems: policies for bi-directional restless bandits

Cited by 8 publications

References 12 publications

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

Monotone Policies and Indexability for Bidirectional Restless Bandits

Contact Info

Product

Resources

About