Some indexable families of restless bandit problems

Glazebrook, K. D.; Ruiz-Hernández, Diego; Kirkbride, C.

doi:10.1017/s000186780000121x

Cited by 12 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further classes of indexable problems are the dual speed problem of Glazebrook, Nino-Mora, and Ansell [24], the maintenance models of Glazebrook, Ruiz-Hernandez, and Kirkbride [25], and the spinning plates and squad models of Glazebrook, Kirkbride, and Ruiz-Hernandez [23]. Our paper is in line with these works in that it trades indexability for specific structural conditions.…”

Section: Indexabilitysupporting

confidence: 63%

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Fryer

Harms

2013

View full text Add to dashboard Cite

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in applications such as human capital development, job search, and occupational choice. Using new insights from stochastic control, along with a monotonicity condition on the payoff dynamics, we show that optimal strategies in our model are stopping rules that can be characterized by an index which formally coincides with Gittins' index. Our result implies the indexability of a new class of restless bandit models.

show abstract

Section: Indexabilitysupporting

confidence: 63%

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Fryer

Harms

2013

View full text Add to dashboard Cite

show abstract

“…Weber and Weiss (1990) prove that Whittle's index is asymptotically optimal when the ratio of the active number of bandits to the total number is fixed. Most papers (e.g., Ansell et al 2003;Glazebrook et al 2005Glazebrook et al , 2006 focus on proving indexability before proceeding to compute Whittle's index and numerically solving the problem. In our model, we prove that a much simpler myopic policy is optimal in special settings of the problem and use the policy to devise a heuristic for more general settings.…”

Section: Literature Reviewmentioning

confidence: 99%

“…At an abstract level, our model is most closely related to Glazebrook et al (2005), which studies the optimal allocation of repairmen to machines that deteriorate under usage. However, Glazebrook et al (2006) assume that the states of machines are known at every decision epoch. In our model, since a patient's health state is not observable without a diagnosis during the visit and every patient is not seen each period, we assume that the decision maker only knows the probability distribution over health states.…”

Section: Literature Reviewmentioning

confidence: 99%

Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model

Deo

Iravani

Jiang

et al. 2013

Operations Research

View full text Add to dashboard Cite

This paper studies a model of community-based healthcare delivery for a chronic disease. In this setting, patients periodically visit the healthcare delivery system, which influences their disease progression and consequently their health outcomes. We investigate how the provider can maximize community-level health outcomes through better operational decisions pertaining to capacity allocation across different patients. To do so, we develop an integrated capacity allocation model that incorporates clinical (disease progression) and operational (capacity constraint) aspects. Specifically, we model the provider's problem as a finite horizon stochastic dynamic program, where the provider decides which patients to schedule at the beginning of each period. Therapy is provided to scheduled patients, which may improve their health states. Patients that are not seen follow their natural disease progression. We derive a quantitative measure for comparison of patients' health states and use it to design an easy-to-implement myopic heuristic that is provably optimal in special cases of the problem. We employ the myopic heuristic in a more general setting and test its performance using operational and clinical data obtained from Mobile C.A.R.E. Foundation, a community-based provider of pediatric asthma care in Chicago. Our extensive computational experiments suggest that the myopic heuristic can improve the health gains at the community level by up to 15% over the current policy. The benefit is driven by the ability of our myopic heuristic to alter the duration between visits for patients with different health states depending on the tightness of the capacity and the health states of the entire patient population.

show abstract

“…In reality, set-up times would impose a penalty upon Red for such a policy. Glazebrook, Kirkbride and Ruiz (2005) propose modifications to indices which take account of switching penalties and/or times. Such modifications can be applied to all of the models discussed in this paper, although strict optimality is no longer achieved.…”

Section: Model 1 -Red Learns About the Nature Of Blue Targetsmentioning

confidence: 99%

Index Policies for Shooting Problems

Glazebrook

Kirkbride

Mitchell

et al. 2007

Operations Research

Self Cite

View full text Add to dashboard Cite

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, ABSTRACT (maximum 200 words)We consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both compares the performance of (optimal) index policies to a range of competitors and also demonstrates the value to Red of (optimal) disengagement. NUMBER OF PAGES 2714. SUBJECT TERMS multi-armed bandits, Gitten Indices, suppression of enemy air defense PRICE CODE SECURITY CLASSIFICATION OF REPORT Unclassified SECURITY CLASSIFICATION OF THIS PAGE Unclassified SECURITY CLASSIFICATION OF ABSTRACT Unclassified LIMITATION OF ABSTRACT UL iiIndex policies for shooting problems AbstractWe consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both c...

show abstract

Some indexable families of restless bandit problems

Cited by 12 publications

References 16 publications

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model

Index Policies for Shooting Problems

Contact Info

Product

Resources

About