2015
DOI: 10.48550/arxiv.1505.01918
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Abstract: Consider the problem of a controller sampling sequentially from a finite number of N ≥ 2 populations, specified by random variables X i k , i = 1, . . . , N, and k = 1, 2, . . .; where X i k denotes the outcome from population i the k th time it is sampled. It is assumed that for each fixed i, {X i k } k≥1 is a sequence of i.i.d. uniform random variables over some interval [ai, bi], with the support (i.e., ai, bi) unknown to the controller. The objective is to have a policy π for deciding, based on available d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
7
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 14 publications
(15 reference statements)
1
7
0
Order By: Relevance
“…However, bounded bandits can have an arbitrarily high kurtosis, so our settings are not directly comparable (and we think that bounded distributions with an unknown range is a more natural assumption). Cowan and Katehakis [14] study adaptation to the range but in the restricted case of uniform distributions; see also similar results by Cowan et al [15] for Gaussian distributions with unknown means and variances. Additional important references are discussed in Appendix A of the supplementary material.…”
Section: Introductionsupporting
confidence: 68%
“…However, bounded bandits can have an arbitrarily high kurtosis, so our settings are not directly comparable (and we think that bounded distributions with an unknown range is a more natural assumption). Cowan and Katehakis [14] study adaptation to the range but in the restricted case of uniform distributions; see also similar results by Cowan et al [15] for Gaussian distributions with unknown means and variances. Additional important references are discussed in Appendix A of the supplementary material.…”
Section: Introductionsupporting
confidence: 68%
“…These policies form the basis for deriving logarithmic regret polices for more general models, cf. Auer et al (2002), Auer and Ortner (2010), Cowan et al (2015), Cowan and Katehakis (2015a).…”
Section: Introductionmentioning
confidence: 99%
“…The infimum over the KL divergence can be explicitly computed for any i ̸ = 1 under uniform models (Cowan and Katehakis 2015) as…”
Section: Problem Formulationmentioning
confidence: 99%