2002
DOI: 10.1137/s0097539701398375
|View full text |Cite
|
Sign up to set email alerts
|

The Nonstochastic Multiarmed Bandit Problem

Abstract: Abstract. In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

17
2,184
1
9

Year Published

2005
2005
2019
2019

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 1,791 publications
(2,312 citation statements)
references
References 17 publications
17
2,184
1
9
Order By: Relevance
“…We also extend our results to the partial information model, also called the adversarial multiarmed bandit (MAB) problem in Auer et al (2002a). In this model, the online algorithm only gets to observe the loss of the action actually selected, and does not see the losses of the actions not chosen.…”
Section: Introductionmentioning
confidence: 64%
“…We also extend our results to the partial information model, also called the adversarial multiarmed bandit (MAB) problem in Auer et al (2002a). In this model, the online algorithm only gets to observe the loss of the action actually selected, and does not see the losses of the actions not chosen.…”
Section: Introductionmentioning
confidence: 64%
“…To solve the multi-armed bandit problem, the exponential-weight algorithm for exploration and exploitation (Exp3) was proposed by Auer et al [14] in 2002. Exp3 is based on a reinforcement learning scheme and it solves the following problem: "If there are many available actions with uncertain outcomes in a system, how should the system act to maximize the quality of the results over many trials?"…”
Section: Multi-armed Bandit Problemmentioning
confidence: 99%
“…in Cesa-Bianchi et al's CombBand [6], which is itself an adaptation of Auer et al's Exp3 [2] from the finite case to the structured combinatorial case. The distribution from which the actionsπ t are drawn in the algorithm differ from the distribution used in CombBand, and give rise to the technical difficulty of variance estimation, resolved in Lemma 2.…”
Section: Algorithm Banditrank and Its Guaranteementioning
confidence: 99%