Monotone Policies and Indexability for Bidirectional Restless Bandits

Glazebrook, K. D.; Hodge, D. J.; Kirkbride, C.

doi:10.1017/s0001867800006194

Cited by 5 publications

(16 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These reinitializing bandits have some common features with models previously addressed: it is similar to the reward depletion and replenishment model presented in [4], and it also shares with bidirectional bandits in [5], the property that the active and passive actions produce opposite movements on the state space. Another related application is found in [7], where a new type of congestion control scheduling method based on a MARBP is proposed, motivated by the Internet flows behaving according to the Transmission Control Protocol, and thus admitting a reinitializing feature.…”

Section: Introductionmentioning

confidence: 82%

“…Denote by V β (φ 0 , λ, i) the expression (9) evaluated by setting t * (φ 0 , λ) = i. Notice that, solving problem (4), that is, finding the states that belong to A * (λ), is therefore equivalent to finding the maximum positive integer i such that it holds: (5) and (6), and given that V * β (φ 0 , λ) = V β (φ 0 , λ, i), using (4) we have that:…”

Section: Dp Analysis and Proof Of Theorem 31mentioning

confidence: 99%

“…Furthermore, the indexability of special classes of MARBP has been specifically addressed and thoroughly studied using various approaches. These include some families of restless bandits which arise in machine maintenance and stochastic scheduling problems with switching costs, as those in Glazebrook, Ruiz-Hernandez, and Kirkbride [4], the bidirectional bandits introduced in Glazebrook, Hodge, and Kirkbride [5], the reinitializing bandits in Jacko and Sanso [7], and restless models in telecommunication and opportunistic spectrum access as in Liu and Zhao [10], among others. These papers are part of the body of literature that has contributed to a significant advance in the understanding of this property, yet as Liu, Weber, and Zhao [11] put it “[…] establishing indexability is still an open problem and often relies on numerical algorithms ”.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Indexability and Optimal Index Policies for a Class of Reinitialising Restless Bandits

Villar

2015

Prob. Eng. Inf. Sci.

View full text Add to dashboard Cite

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of bandits in which the projects' state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

show abstract

Section: Introductionmentioning

confidence: 82%

Section: Dp Analysis and Proof Of Theorem 31mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Indexability and Optimal Index Policies for a Class of Reinitialising Restless Bandits

Villar

2015

Prob. Eng. Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…The condition appears to be a natural condition which should be satisfied by all models, but that is not the case [11]. Sufficient conditions for indexability have been investigated under specific modeling assumptions (two state fully or partially observed restless bandits [2], [6]; monotone bandits [2], [5], [13]; models with right-skeip free transitions [1], [14]; models with monotone or convex cost/reward [2], [13], [14], [16]- [18]; models satisfying partial conservation laws [19], [20]). Indexability for models arising in specific applications has been investigated in [1], [5], [14]- [18].…”

Section: Introductionmentioning

confidence: 99%

Restless bandits with controlled restarts: Indexability and computation of Whittle index

Akbarzadeh¹,

Mahajan²

2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then present a general algorithm to compute Whittle index for indexable restless bandits. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.

show abstract

“…In tackling this, a multi-armed bandit approach can be taken. For example, the bidirectional restless bandits (Glazebrook et al, 2013) could be a suitable fit for representing the learners' learning.…”

Section: Discussionmentioning

confidence: 99%

Capacity management for personalized services in education

Aslan¹

View full text Add to dashboard Cite

Operational Tactical Strategic O.1 How many sessions T.1 How much teaching capacity S.1 How many teachers to plan per activity ? to allocate to activities to hire per subject? O.2 Which teachers to assign on each subject, or to learners? S.2 How many classrooms to sessions? T.2 How many classrooms to build? O.3 Which classrooms to assign to allocate to activities S.3 Which seating capacity to sessions? on each subject? to consider for classrooms? O.4 Which time blocks to assign S.4 How frequently to perform to sessions? competency assessments? O.5 Which learners to assign to sessions?In providing these capacity management tools, we use a variety of OR frameworks. Among these are meta-heuristics, simulation, integer programming, stochastic programming, model predictive control, queueing theory and Markov decision processes. In numerically validating and demonstrating the effectiveness of the presented planning tools, we mainly rely on the data coming from Dutch secondary education schools, through collaborating with the Zo.Leer.Ik! schools network.In summary, this thesis has two overarching goals: showing the impact that an Operations Research perspective can make in the transition toward personalization of services in education, and introducing novel operational planning problems on capacity management to the OR literature and providing new solution methods for them. Capacity management tools for service providersWhen educational service providers manage their available resource capacity (i.e., teaching capacity, classrooms and time blocks) to serve the personalized learning demands of many learners, they face several logistics planning decision problems. We identify these in Table 1.1 at operational (O.1-O.4), tactical (T.1-T.2) and strategic (S.1-S.4) decisionmaking levels for a service system that takes a demand-driven planning approach.The operational-level decision problems facilitate the organization of learning activities on subjects (e.g., Mathematics, English) to serve the personalized learning demands that are directly provided by learners. It must be noted that the problems O.2-O.4 are also relevant for one-size-fits-all systems. However, for PL systems these decisions cannot be fixed in the beginning of an academic year, instead they have to be remade depending on the changes in learning demands throughout the year. Moreover, the decision problems O.1 and O.5 are not relevant at all for one-size-fits-all systems.

show abstract

Monotone Policies and Indexability for Bidirectional Restless Bandits

Cited by 5 publications

References 17 publications

Indexability and Optimal Index Policies for a Class of Reinitialising Restless Bandits

Indexability and Optimal Index Policies for a Class of Reinitialising Restless Bandits

Restless bandits with controlled restarts: Indexability and computation of Whittle index

Capacity management for personalized services in education

Contact Info

Product

Resources

About