2013 Asilomar Conference on Signals, Systems and Computers 2013
DOI: 10.1109/acssc.2013.6810607
|View full text |Cite
|
Sign up to set email alerts
|

Achieving complete learning in Multi-Armed Bandit problems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
7
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 6 publications
2
7
0
Order By: Relevance
“…We design algorithms that exploit the prior information in all objectives simultaneously to rule out arms that are not lexicographic optimal. Our regret bounds match the ones in Garivier et al (2018) and improve the ones in Vakili and Zhao (2013) for the case with a single objective.…”
Section: Related Worksupporting
confidence: 62%
See 2 more Smart Citations
“…We design algorithms that exploit the prior information in all objectives simultaneously to rule out arms that are not lexicographic optimal. Our regret bounds match the ones in Garivier et al (2018) and improve the ones in Vakili and Zhao (2013) for the case with a single objective.…”
Section: Related Worksupporting
confidence: 62%
“…Bubeck and Liu (2013) considers Thompson sampling and shows that its regret is uniformly bounded when * and a positive lower bound on are known. On the other hand, Vakili and Zhao (2013) considers a weaker prior information model where the learner knows a near-optimal expected reward , which can be computed using * and a positive lower bound on . The proposed algorithm obtains ∑ a a ∕ 3 regret, where 𝛿 = 𝜇 * − 𝜂 < 𝛥 and a is the suboptimality gap of arm a.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There are standard techniques for such extensions by replacing the concentration result with the corresponding ones for lighttailed and heavy-tailed distributions (the latter also requires replacing sample means with truncated sample means). Similar extensions for classic MAB problems without side information are discussed in [4], [34]. To illuminate the main ideas without too much technicality, most existing work assumes an even stronger assumption of bounded support in [0, 1] (see [2], [3], [24], etc.…”
Section: Extensions To Other Distributionsmentioning
confidence: 99%
“…3 The challenges described above motivates us to focus on the cases when the learner has prior knowledge on expected rewards. Specifically, we consider two types of prior knowledge, which generalize the prior knowledge introduced in [8] and [9] to multidimensional rewards. In the first case, we assume that the expected rewards of a lexicographic optimal arm are known.…”
Section: Introductionmentioning
confidence: 99%