1991
DOI: 10.1214/aos/1176348382
|View full text |Cite
|
Sign up to set email alerts
|

One-Armed Bandit Problems with Covariates

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
36
1
1

Year Published

1993
1993
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(38 citation statements)
references
References 13 publications
0
36
1
1
Order By: Relevance
“…This is the first such algorithm for adapting exploration on-line. Previous research in bandit problems in general [2], [12], and for the one-armed bandit with covariates problem in particular [9], [13], has focused on finding policies for selecting arms that converge to optimal behaviour asymptotically, and not for designing policies that autonomously maximise reward in finite time by adapting to the type of problem faced. Moreover, -ADAPT can be generalised to control exploration in a variety of bandit problems -including problems with multiple arms and dynamic problems with changing reward structures, we discuss this aspect more in Section V.…”
Section: Introductionmentioning
confidence: 99%
“…This is the first such algorithm for adapting exploration on-line. Previous research in bandit problems in general [2], [12], and for the one-armed bandit with covariates problem in particular [9], [13], has focused on finding policies for selecting arms that converge to optimal behaviour asymptotically, and not for designing policies that autonomously maximise reward in finite time by adapting to the type of problem faced. Moreover, -ADAPT can be generalised to control exploration in a variety of bandit problems -including problems with multiple arms and dynamic problems with changing reward structures, we discuss this aspect more in Section V.…”
Section: Introductionmentioning
confidence: 99%
“…The second inequality follows from the condition that the second coordinate of the estimate, ϑ = θ 2 , and then extending the finite sum to the infinite sum. The third inequality follows from the definition of Cond3a and changing the time index to τ ′ , similarly to the reasoning in (19)- (20). By Sanov's theorem, each term is exponentially upper bounded w.r.t.…”
Section: Appendix V Proof Of Theoremmentioning
confidence: 83%
“…By Sanov's theorem on R (Theorem 9), the probability of each term in (20) is exponentially upper bounded w.r.t. τ ′ , which implies that the summation has bounded expectation.…”
Section: Appendix V Proof Of Theoremmentioning
confidence: 99%
See 2 more Smart Citations