Some indexable families of restless bandit problems

Glazebrook, K. D.; Ruiz-Hernández, Diego; Kirkbride, C.

doi:10.1239/aap/1158684996

Cited by 65 publications

(68 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the general context of RMBP, there is a rich literature on indexability. See [23] for the linear programming representation of conditions for indexability and [9] for examples of specific indexable restless bandit processes. Constant-factor approximation algorithms for RMBP have also been explored in the literature.…”

Section: Related Workmentioning

confidence: 99%

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access

Liu

Zhao

2010

IEEE Trans. Inform. Theory

319

317

View full text Add to dashboard Cite

We consider a class of restless multi-armed bandit problems (RMBP) that arises in dynamic multichannel access, user/server scheduling, and optimal activation in multi-agent systems. For this class of RMBP, we establish the indexability and obtain Whittle's index in closed-form for both discounted and average reward criteria. These results lead to a direct implementation of Whittle's index policy with remarkably low complexity. When these Markov chains are stochastically identical, we show that Whittle's index policy is optimal under certain conditions. Furthermore, it has a semi-universal structure that obviates the need to know the Markov transition probabilities. The optimality and the semi-universal structure result from the equivalency between Whittle's index policy and the myopic policy established in this work. For non-identical channels, we develop efficient algorithms for computing a performance upper bound given by Lagrangian relaxation. The tightness of the upper bound and the near-optimal performance of Whittle's index policy are illustrated with simulation examples. Index TermsOpportunistic access, dynamic channel selection, restless multi-armed bandit, Whittle's index, indexability, myopic policy.Proof: The upper bound of J is obtained from the upper bound of the optimal performance for generally non-identical channels as given in (43). The lower bound of J w is obtained from the structure of Whittle's index policy. See Appendix H for the complete proof.Corollary 2: Let η = Jw J be the approximation factor defined as the ratio of the performance by Whittle's index policy to the optimal performance. We have

show abstract

Section: Related Workmentioning

confidence: 99%

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access

Liu

Zhao

2010

IEEE Trans. Inform. Theory

319

317

View full text Add to dashboard Cite

show abstract

“…We consider Whittle's index and other index-type policies. However, the implementation of Whittle's index is nontrivial because i) Whittle's index is only applicable to a restricted class of problems that satisfy the so-called indexability property, which is difficult to check in general (Whittle 1988, Glazebrook et al 2006; ii) the computation of Whittle's index involves repeated solution of single-bandit dynamic programs, which is difficult for our case since each single bandit has a multi-dimensional state space.…”

Section: Research Questions and Key Resultsmentioning

confidence: 99%

Prioritizing Hepatitis C Treatment in U.S. Prisons

Ayer

Zhang

Bonifonte

et al. 2019

Operations Research

View full text Add to dashboard Cite

Hepatitis C virus (HCV) prevalence in prison systems is about 10 times higher than in the community. As such, prison systems offer a unique opportunity to control the HCV epidemic. New HCV-treatment drugs are very effective, but providing treatment to all inmates is prohibitively expensive unless prices fall. Current practice is to prioritize treatment based on disease severity and puts less emphasis on other factors such as the remaining sentence length and injection drug use behavior. In “Prioritizing Hepatitis C Treatment in U.S. Prisons,” T. Ayer, C. Zhang, A. Bonifonte, A. Spaulding, and J. Chhatwal analyze optimal approaches for treatment prioritization under resource constraints by developing a restless bandit modeling framework. They present an easy-to-implement closed-form index policy to support hepatitis C treatment prioritization decisions in U.S. prisons. They also test their proposed policy using a detailed, realistic agent-based simulation model and shed light on several controversial health policy decisions related to hepatitis C treatment prioritization.

show abstract

“…Because θ 1 (x, 1) = 1, Condition 2 is satisfied, and this version of Problem 2 is indexable. Glazebrook et al [11] formulated the same problem, slightly differently from us, as a restless bandit problem where one does not wait for the broken arm to be fixed before the reward stream is again available: if one plays a broken arm then he/she obtains its immediate reward minus the switching cost, and the arm is guaranteed to be available in the next period. However, the forms of their and our Whittle indices are the same.…”

Section: Proposition 5 Under Condition 2 Activating the Arm In Statmentioning

confidence: 99%

“…We have developed the Whittle indices for Problems 1 and 2 in the previous sections. Here we discuss how to compute the indices in (9) and (11). For this purpose, we develop the restartin problem representation of the indices.…”

Section: The Restart-in Problemmentioning

confidence: 99%

“…The restart-in problem representation of the Gittins index for the classical multiarmed bandit problem was introduced in [16]. The index in (11) is similar to the Gittins index; therefore, we first formulate it as a restart-in problem. We then propose a generalization of the restart-in problem representation for the Whittle index in (9).…”

Section: The Restart-in Problemmentioning

confidence: 99%

See 1 more Smart Citation

Index policies for discounted bandit problems with availability constraints

Dayanık

Powell

Yamazaki

2008

Advances in Applied Probability

View full text Add to dashboard Cite

Multi-armed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied.Then it is assumed that arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multi-armed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

show abstract

Some indexable families of restless bandit problems

Cited by 65 publications

References 17 publications

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access

Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access

Prioritizing Hepatitis C Treatment in U.S. Prisons

Index policies for discounted bandit problems with availability constraints

Contact Info

Product

Resources

About