In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.
We propose a general Markovian model for the optimal control of admissions and subsequent routing of customers for service provided by a collection of heterogeneous stations. Queue-length information is available to inform all decisions. Admitted customers will abandon the system if required to wait too long for service. The optimisation goal is the maximisation of reward rate earned from service completions, net of the penalties paid whenever admission is denied, and the costs incurred upon every customer loss through impatience. We show that the system is indexable under mild conditions on model parameters and give an explicit construction of an index policy for admission control and routing founded on a proposal of Whittle for restless bandits. We are able to gain insights regarding the strength of performance of the index policy from the nature of solutions to the Lagrangian relaxation used to develop the indices. These insights are strengthened by the development of performance bounds. Although we are able to assert the optimality of the index heuristic in a range of asymptotic regimes, the performance bounds are also able to identify instances where its performance is relatively weak. Numerical studies are used to illustrate and support the theoretical analyses.
We develop appropriately generalized notions of indexability for problems of dynamic resource allocation where the resource concerned may be assigned more flexibility than is allowed, for example, in classical multi-armed bandits. Most especially we have in mind the allocation of a divisible resource (manpower, money, equipment) to a collection of objects (projects) requiring it in cases where its over-concentration would usually be far from optimal. The resulting project indices are functions of both a resource level and a state. They have a simple interpretation as a fair charge for increasing the resource available to the project from the specified resource level when in the specified state. We illustrate ideas by reference to two model classes which are of independent interest. In the first, a pool of servers is assigned dynamically to a collection of service teams, each of which mans a service station. We demonstrate indexability under a natural assumption that the service rate delivered is increasing and concave in the team size. The second model class is a generalization of the spinning plates model for the optimal deployment of a divisible investment resource to a collection of reward generating assets. Asset indexability is established under appropriately drawn laws of diminishing returns for resource deployment. For both model classes numerical studies provide evidence that the proposed greedy index heuristic performs strongly. This is a class of models concerned with the sequential allocation of effort, to be thought of as a single indivisible resource, to a collection of stochastic reward generating projects (or bandits as they are sometimes called). Gittins demonstrated that optimal project choices are those of highest index. There is no doubt that the idea that strongly performing policies are determined by simple, interpretable calibrations (i.e., indices) of decision options is an attractive and powerful one and offers crucial computational benefits. There is now substantial literature describing extensions to and reformulations of Gittins' result. Some key contributions are cited in the recent survey of Mahajan and Teneketzis [14].Whittle [21] introduced a class of restless bandit problems (RBPs) as a means of addressing a critical limitation of Gittins' MABs, namely, that projects should remain frozen while not in receipt of effort. In RBPs, projects may change state while active or passive though according to different dynamics. However, this generalization is bought at great cost. In contrast to MABs, RBPs are almost certainly intractable having been shown to be PSPACE-hard by Papadimitriou and Tsitsiklis [16]. Whittle [21] proposed an index heuristic for those RBPs which pass an indexability test. This heuristic reduces to Gittins' index policy in the MAB case. Whittle's index emerges from a Lagrangian relaxation of the original problem and has an interpretation as a fair charge for the allocation of effort to a particular project in a particular state. Weber and Weiss [20] established a fo...
This paper concerns two families of Markov decision problem that fall within the family of (bi-directional) restless bandits, an intractable class of decision processes introduced by Whittle. The spinning plates problem concerns the optimal management of a portfolio of reward-generating assets whose yields grow with investment but otherwise tend to decline. In the model of asset exploitation called the squad system, the yield from an asset tends to decline when it is used but will recover when the asset is at rest. In all cases, simply stated conditions are given that guarantee indexability of the problem, together with conditions necessary and sufficient for its strict indexability. The index heuristics for asset activation that emerge from the analysis are assessed numerically and found to perform very strongly.
Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.
In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, ABSTRACT (maximum 200 words)We consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both compares the performance of (optimal) index policies to a range of competitors and also demonstrates the value to Red of (optimal) disengagement. NUMBER OF PAGES 2714. SUBJECT TERMS multi-armed bandits, Gitten Indices, suppression of enemy air defense PRICE CODE SECURITY CLASSIFICATION OF REPORT Unclassified SECURITY CLASSIFICATION OF THIS PAGE Unclassified SECURITY CLASSIFICATION OF ABSTRACT Unclassified LIMITATION OF ABSTRACT UL iiIndex policies for shooting problems AbstractWe consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red's own (possible) demise. Such a situation arises in various military contexts such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multi-armed bandits has been previously deployed to develop optimal policies for Red in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems which are of practical military import. Such features include levels of risk to Red which are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets and the possibility of Red disengagement. The paper concludes with a numerical study which both c...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.