2014
DOI: 10.1007/s10479-013-1523-0
|View full text |Cite
|
Sign up to set email alerts
|

Four proofs of Gittins’ multiarmed bandit theorem

Abstract: We survey four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins' original exchange argument, Weber's prevailing charge argument, Whittle's Lagrangian dual approach, and a proof based on generalized conservation laws and LP duality.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(15 citation statements)
references
References 40 publications
0
15
0
Order By: Relevance
“…The proof of this result has been obtained using a number of different perspectives, for example Weber's prevailing charge formulation [75] (which we consider in more detail below), Whittle's retirement option formulation [76] and its extension without a Markov assumption by El Karoui and Karatzas [32] (and [33] in continuous time). A review of the proofs in discrete time is given by Frostig and Weiss [39]. However, in all these cases, the objective to be optimized is the discounted expected gain/loss -in particular, we are assumed to have no risk-aversion or uncertainty-aversion.…”
Section: Multi-armed Banditsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proof of this result has been obtained using a number of different perspectives, for example Weber's prevailing charge formulation [75] (which we consider in more detail below), Whittle's retirement option formulation [76] and its extension without a Markov assumption by El Karoui and Karatzas [32] (and [33] in continuous time). A review of the proofs in discrete time is given by Frostig and Weiss [39]. However, in all these cases, the objective to be optimized is the discounted expected gain/loss -in particular, we are assumed to have no risk-aversion or uncertainty-aversion.…”
Section: Multi-armed Banditsmentioning
confidence: 99%
“…The subtlety in the proof of Gittins' theorem is to give a tractable representation of the class of control policies available to the decision maker. In the original formulation (see, for example, [41,75,76,39]), the class considered is feedback controls, as in a standard Markovian stochastic control problem; i.e. the system of bandits is modelled as a single Markov process, and the controls alter its transition probabilities.…”
Section: General Problem Formulationmentioning
confidence: 99%
“…Central results are the existence and form of index-based policies for certain models that maximize the present value of expected rewards, cf. Gittins et al [19], Frostig and Weiss [18], Mahajan and Teneketzis [37], Kaspi and Mandelbaum [28], Ishikida and Varaiya [27], El Karoui and Karatzas [14], Gittins [22], Gittins [21] and Gittins et al [20].…”
Section: Introductionmentioning
confidence: 99%
“…The classical MAB problem has an optimal solution given by the Gittins index policy which associates a dynamic index to each arm and then plays the arm with the highest index in each period (see Frostig and Weiss [12] for several proofs of this result). In this section we define and analyze an index policy for the RMAB model in Equation (1).…”
Section: Robust Index Policymentioning
confidence: 99%