2019 Australian &Amp; New Zealand Control Conference (ANZCC) 2019
DOI: 10.1109/anzcc47194.2019.8945748
|View full text |Cite
|
Sign up to set email alerts
|

Towards Q-learning the Whittle Index for Restless Bandits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
36
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(39 citation statements)
references
References 13 publications
2
36
0
Order By: Relevance
“…The second category contains different learning methods for RMABs. Fu et al 2019 provide a Q-learning method where the Q value is defined based on the Whittle indices, states, and actions. However, they do not provide proof of convergence to optimal solution and experimentally, do not learn (near-)optimal policies.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The second category contains different learning methods for RMABs. Fu et al 2019 provide a Q-learning method where the Q value is defined based on the Whittle indices, states, and actions. However, they do not provide proof of convergence to optimal solution and experimentally, do not learn (near-)optimal policies.…”
Section: Related Workmentioning
confidence: 99%
“…(2) AB [Avrachenkov and Borkar, 2020], (3) Fu [Fu et al, 2019], (4) Greedy: greedily chooses the top M arms with the highest difference in their observed average rewards between actions 1 and 0 at their current states, and (5) Random: chooses M arms uniformly at random at each step. We consider a numerical example 1 and a maternal healthcare application to simulate RMAB instances using beneficiaries' behavioral pattern from the call-based program.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Despite the recent successes of reinforcement learning (RL) for solving large-scale games [28,37], RL has so far seen little application to RMABs, except for a few recent works that learn Whittle indices for indexable binary-action RMABs using (i) deep RL [29] and (ii) Q-learning when states are observable [5,7] or when arms are homogeneous [4]. In contrast, our deep RL approach provides a more general solution to binary and multi-action RMAB domains that performs well regardless of indexability.…”
Section: Related Workmentioning
confidence: 99%
“…Addressing this, Biswas et al [5] give a Q-learning-based based algorithm that acts on the arms that have the largest difference between their active and passive Q values. Fu et al [8] take a related approach that adjust the Q values by some 𝜆, and use it to estimate the Whittle index. Similarly, Avrachenkov and Borkar [3] provide a two-timescale algorithm that learns the Q values as well as the index values over time.…”
Section: Related Workmentioning
confidence: 99%
“…To address this shortcoming in previous work, this paper presents the first algorithms for the online setting for multi-action RMABs. Indeed, the online setting for even binary-action RMABs has received only limited attention, in the works of Fu et al [8], Avrachenkov and Borkar [3], and Biswas et al [5,6]. These papers adopt variants of the Q-learning update rule [29,30], a well studied reinforcement learning algorithm, for estimating the effect of each action across changing dynamics of the systems.…”
Section: Introductionmentioning
confidence: 99%