1996
DOI: 10.1006/aama.1996.0007
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Adaptive Policies for Sequential Allocation Problems

Abstract: Consider the problem of sequential sampling from m statistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters g ⌰, it is shown that there exists a class C of R Ž. adaptive policies with the following properties: i The expected n horizon reward 0 n UF Policies in C are specified via easily computable indices, defined as unique R Ž. solutions to dual problems that arise naturally from the functional form of M. In addition, the assumption… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

5
158
0
4

Year Published

2002
2002
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 160 publications
(167 citation statements)
references
References 24 publications
5
158
0
4
Order By: Relevance
“…We show however that this is possible. The constant before the logarithmic term consists of the ratio ∆a Ka that is also very similar to the known bounds for the expected regret ( [6,22]), up to the constant c, that could definitely be reduced by a more careful analysis and parameter tuning (this is not the main focus of this work), and more importantly the constant 1 + ǫ a . Theorem 1 holds for a larger class of distributions than the one considered e.g.…”
Section: Theoremsupporting
confidence: 53%
“…We show however that this is possible. The constant before the logarithmic term consists of the ratio ∆a Ka that is also very similar to the known bounds for the expected regret ( [6,22]), up to the constant c, that could definitely be reduced by a more careful analysis and parameter tuning (this is not the main focus of this work), and more importantly the constant 1 + ǫ a . Theorem 1 holds for a larger class of distributions than the one considered e.g.…”
Section: Theoremsupporting
confidence: 53%
“…Robbins' results were also obtained by Yakowitz and Lowe (1991), and by Burnetas and Katehakis (1996).…”
Section: Acknowledgmentsmentioning
confidence: 65%
“…13) times (which, from (1. Burnetas and Katehakis [37] extended this result to several classes P of multi-dimensional parametric distributions. By writing…”
Section: Lower Boundsmentioning
confidence: 83%
“…There are two types of lower bounds: (1) The problem-dependent bounds [81,37] which say that for a given problem, any "admissible" algorithm will suffer -asymptotically-a logarithmic regret with a constant factor that depends on the arm distributions. (2) The problemindependent bounds [41,30] which states that for any algorithm and any time-horizon n, there exists an environment on which this algorithm will have a regret at least of order √ Kn.…”
Section: Lower Boundsmentioning
confidence: 99%
See 1 more Smart Citation