2015
DOI: 10.48550/arxiv.1504.05823
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 0 publications
1
10
0
Order By: Relevance
“…As such, this section essentially reproduces the result of Cowan et al [14] (presented therein in terms of classical regret) in the framework established herein. In this case the controller is interested in activating the bandit with maximum expected value as often as possible.…”
Section: Unknown Means and Unknown Variances: Maximizing Expected Valuesupporting
confidence: 74%
“…As such, this section essentially reproduces the result of Cowan et al [14] (presented therein in terms of classical regret) in the framework established herein. In this case the controller is interested in activating the bandit with maximum expected value as often as possible.…”
Section: Unknown Means and Unknown Variances: Maximizing Expected Valuesupporting
confidence: 74%
“…These policies form the basis for deriving logarithmic regret polices for more general models, cf. Auer et al (2002), Auer and Ortner (2010), Cowan et al (2015), Cowan and Katehakis (2015a).…”
Section: Introductionmentioning
confidence: 99%
“…Policies that achieve this minimal asymptotic growth rate have been derived for specific parametric models in Lai and Robbins [9], Burnetas and Katehakis [4], Honda and Takemura [7], Honda and Takemura [6], Honda and Takemura [8], Cowan et al [5] and references therein. In general it is not always easy to obtain such optimal polices, thus, policies that satisfy the less strict requirement of Eq.…”
Section: Related Literaturementioning
confidence: 99%
“…In such instances, we may in fact conclude from the results presented herein, and standard results relating modes of convergence, that for the policies constructed here, for g(n) = O(ln n), the sequences of random variables Rπ F g (n)/g(n), Rπ O g (n)/g(n) are not uniformly integrable. An example as to how this can occur is given via the proof of Theorem 2 of Cowan et al [5], where with a non-trivial probability, non-representative initial sampling of each bandit biases expected future activations of sub-optimal bandits super-logarithmically. This effect does not influence the long term almost sure behavior of these policies.…”
Section: Related Literaturementioning
confidence: 99%
See 1 more Smart Citation