1987
DOI: 10.1287/moor.12.2.262
|View full text |Cite
|
Sign up to set email alerts
|

The Multi-Armed Bandit Problem: Decomposition and Computation

Abstract: The multi-armed bandit problem arises in sequentially allocating effcot to one of N prefects and sequentially asngning patients to cme of N treatmoits in dinical trials. Gittins ainl Jones (1974) have shown that oae. optintal policy Ua the JV-pn^ect problem, an A^dimensional discounted Maricov dedskm chain, is detennined by tiK following largest-index nde. There is an index for eadi state of eadi given project that depoids oidy on die data of that prcgect In each period one allocates effect to a prcgect with l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
167
0
1

Year Published

1994
1994
2014
2014

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 271 publications
(168 citation statements)
references
References 5 publications
0
167
0
1
Order By: Relevance
“…We show that the optimal policy for the multi-armed under this generalized depreciation model is an index policy, where the indices are propitiously generalized restart in state indices, cf. Katehakis and Veinott [33] and Katehakis et al [29]; see also Sonin [43], Sonin [44] and Steinberg and Sonin [45]. Furthermore, the overall proof suggests a way of understanding the structure of the reward processes, relative to a "natural" time scale of possibly stochastic intervals of activation or "restart blocks", rather than steps of unit time.…”
Section: Introductionmentioning
confidence: 81%
“…We show that the optimal policy for the multi-armed under this generalized depreciation model is an index policy, where the indices are propitiously generalized restart in state indices, cf. Katehakis and Veinott [33] and Katehakis et al [29]; see also Sonin [43], Sonin [44] and Steinberg and Sonin [45]. Furthermore, the overall proof suggests a way of understanding the structure of the reward processes, relative to a "natural" time scale of possibly stochastic intervals of activation or "restart blocks", rather than steps of unit time.…”
Section: Introductionmentioning
confidence: 81%
“…We mention the contributions of Katehakis and Veinott (the restart-in-state method, see Katehakis and Veinott 1987), Varaiya, Walrand and Buyukkoc (the-largest-remaining-index method, see Varaiya et al 1985), and Chen and Katehakis (the linear programming method, see Chen and Katehakis 1986). In this article we present the parametric linear programming method proposed in Kallenberg (1986).…”
Section: Theorem 22mentioning
confidence: 99%
“…The indices in (5), (6) are of Gittins type and are computable by a range of algorithms including the "restart-in-x" approach of Katehakis and Veinott (1987). When the state spaces Ω j , 1 ≤ j ≤ N , are finite the "largest-to-smallest" algorithm of Robinson (1982) (equivalently, the adaptive greedy algorithm of Bertsimas and Niño-Mora (1996)) is available.…”
Section: A General Model For a Single Conflict With Disengagementmentioning
confidence: 99%
“…The authors recommend an adapted version of the "restart-in-(n, δ)" approach to index computation proposed by Katehakis and Veinott (1987). See Glazebrook and Greatrix (1995) and refer to the first author for full details.…”
Section: Model 3 -'Shoot-look-shoot' For Redmentioning
confidence: 99%
See 1 more Smart Citation