1980
DOI: 10.1111/j.2517-6161.1980.tb01111.x
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Armed Bandits and the Gittins Index

Abstract: Summary A plausible conjecture (C) has the implication that a relationship (12) holds between the maximal expected rewards for a multi‐project process and for a one‐project process (F and φi respectively), if the option of retirement with reward M is available. The validity of this relation and optimality of Gittins' index rule are verified simultaneously by dynamic programming methods. These results are partially extended to the case of so‐called “bandit superprocesses”.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
217
0
1

Year Published

1991
1991
2015
2015

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 310 publications
(219 citation statements)
references
References 10 publications
1
217
0
1
Order By: Relevance
“…The following is a simple consequence of Lemma 4 and a result due to Whittle [22]. THEOREM 5 (optimal replenishment, mixed model): If Condition 1 holds with the ?ik being the qualifying sequences, then in the state in which nk, lifts of ammunition are available for combat aboard ship kl, 1 I 1 I M k , 1 5 k 2 g, the next lift in an optimal replenishment strategy should be to any ship ij such that n,, < L,, and where G,(El, .)…”
Section: Mixed Modelsmentioning
confidence: 98%
See 1 more Smart Citation
“…The following is a simple consequence of Lemma 4 and a result due to Whittle [22]. THEOREM 5 (optimal replenishment, mixed model): If Condition 1 holds with the ?ik being the qualifying sequences, then in the state in which nk, lifts of ammunition are available for combat aboard ship kl, 1 I 1 I M k , 1 5 k 2 g, the next lift in an optimal replenishment strategy should be to any ship ij such that n,, < L,, and where G,(El, .)…”
Section: Mixed Modelsmentioning
confidence: 98%
“…When this is the case, it will follow from Lemma 4 that there is a globally optimal strategy which Although such a strategy always enjoys a constrained optimality property (Lemma 4) and would seem invariably to be a sensible heuristic, it nevertheless remains of interest to determine the conditions under which it is globally optimal. Results due to Whittle [22] and Glazebrook [8], which apply quite generally to a class of discounted Markov decision processes in parallel, yield the following prescription:…”
Section: Mixed Modelsmentioning
confidence: 99%
“…In this section, we formulate the node selection problem as a partially observed Markov decision process (POM-DP) multi-arm bandit system [15], which has been widely studied in operations research in the context of an infinite-horizon discounted cost stochastic control problems [16,17]. This problem is studied to make the optimal decision of which arm of the multi-slot gambler machine to pull each time to maximize the total reward.…”
Section: Solving the Node Selection Problemmentioning
confidence: 99%
“…For independent projects, however, it was shown first by Gittins and Jones [6] that there exists a projectspecific dynamic performance measure, later called the Gittins index of a project, such that optimal allocations are obtained from an index policy which (essentially) amounts to focussing at each point only on those projects which exhibit a maximal Gittins index. This celebrated result was subsequently extended from Gittins' and Jones' original discrete-time, Markovian framework to a completely general continuous-time setting; see, e.g., Whittle [15], Varaiya, Walrand and Buyukkoc [13], Mandelbaum [10], Weber [14], El Karoui and Karatzas [3,4], Kaspi and Mandelbaum [8,9].…”
Section: Introductionmentioning
confidence: 96%