1988
DOI: 10.2307/3214163
|View full text |Cite
|
Sign up to set email alerts
|

Restless bandits: activity allocation in a changing world

Abstract: We consider a population of n projects which in general continue to evolve whether in operation or not (although by different rules). It is desired to choose the projects in operation at each instant of time so as to maximise the expected rate of reward, under a constraint upon the expected number of projects in operation. The Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static. If one is constrained to operate m pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
209
0
1

Year Published

2007
2007
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 755 publications
(229 citation statements)
references
References 1 publication
1
209
0
1
Order By: Relevance
“…That is, in each period the constraint S s=1 u s ≤ N is replaced by Ɛ S s=1 u s ≤ N , where the expectation is with respect to all possible states weighted by the probability of reaching each one of them under a given (primal) policy. This fact has been observed by several authors in various other settings (e.g., Whittle 1988, Castañon 1997, and a proof of this equivalence for the finite horizon multiarmed bandit is available in Caro (2005). We also point out that Adelman and Mersereau (2004) provide an alternative linear programming-based bound that is shown to be tighter (or not worse) than (7), but requires more extensive computations.…”
Section: Problem Decomposition and Upper Boundmentioning
confidence: 60%
See 4 more Smart Citations
“…That is, in each period the constraint S s=1 u s ≤ N is replaced by Ɛ S s=1 u s ≤ N , where the expectation is with respect to all possible states weighted by the probability of reaching each one of them under a given (primal) policy. This fact has been observed by several authors in various other settings (e.g., Whittle 1988, Castañon 1997, and a proof of this equivalence for the finite horizon multiarmed bandit is available in Caro (2005). We also point out that Adelman and Mersereau (2004) provide an alternative linear programming-based bound that is shown to be tighter (or not worse) than (7), but requires more extensive computations.…”
Section: Problem Decomposition and Upper Boundmentioning
confidence: 60%
“…The underlying concepts involved are similar to those of the well-established theory of duality for general nonlinear optimization problems (see Bertsekas 1999). The approach dates back to at least the late 1980s with the independent work done by Karmarkar (1987) on a finite horizon multilocation inventory problem and the seminal paper of Whittle (1988) on restless bandits. For more accounts of successful applications of this methodology, see Castañon (1997), Bertsimas and Mersereau (2004), and the references therein.…”
Section: Discussionmentioning
confidence: 93%
See 3 more Smart Citations