2013
DOI: 10.1017/s0001867800006194
|View full text |Cite
|
Sign up to set email alerts
|

Monotone Policies and Indexability for Bidirectional Restless Bandits

Abstract: Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the orig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(16 citation statements)
references
References 17 publications
0
16
0
Order By: Relevance
“…These reinitializing bandits have some common features with models previously addressed: it is similar to the reward depletion and replenishment model presented in [4], and it also shares with bidirectional bandits in [5], the property that the active and passive actions produce opposite movements on the state space. Another related application is found in [7], where a new type of congestion control scheduling method based on a MARBP is proposed, motivated by the Internet flows behaving according to the Transmission Control Protocol, and thus admitting a reinitializing feature.…”
Section: Introductionmentioning
confidence: 82%
See 2 more Smart Citations
“…These reinitializing bandits have some common features with models previously addressed: it is similar to the reward depletion and replenishment model presented in [4], and it also shares with bidirectional bandits in [5], the property that the active and passive actions produce opposite movements on the state space. Another related application is found in [7], where a new type of congestion control scheduling method based on a MARBP is proposed, motivated by the Internet flows behaving according to the Transmission Control Protocol, and thus admitting a reinitializing feature.…”
Section: Introductionmentioning
confidence: 82%
“…Denote by V β (φ 0 , λ, i) the expression (9) evaluated by setting t * (φ 0 , λ) = i. Notice that, solving problem (4), that is, finding the states that belong to A * (λ), is therefore equivalent to finding the maximum positive integer i such that it holds: (5) and (6), and given that V * β (φ 0 , λ) = V β (φ 0 , λ, i), using (4) we have that:…”
Section: Dp Analysis and Proof Of Theorem 31mentioning
confidence: 99%
See 1 more Smart Citation
“…The condition appears to be a natural condition which should be satisfied by all models, but that is not the case [11]. Sufficient conditions for indexability have been investigated under specific modeling assumptions (two state fully or partially observed restless bandits [2], [6]; monotone bandits [2], [5], [13]; models with right-skeip free transitions [1], [14]; models with monotone or convex cost/reward [2], [13], [14], [16]- [18]; models satisfying partial conservation laws [19], [20]). Indexability for models arising in specific applications has been investigated in [1], [5], [14]- [18].…”
Section: Introductionmentioning
confidence: 99%
“…In tackling this, a multi-armed bandit approach can be taken. For example, the bidirectional restless bandits (Glazebrook et al, 2013) could be a suitable fit for representing the learners' learning.…”
Section: Discussionmentioning
confidence: 99%