2016
DOI: 10.1017/s026996481600036x
|View full text |Cite
|
Sign up to set email alerts
|

Asymptotically Optimal Multi-Armed Bandit Policies Under a Cost Constraint

Abstract: We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of n total samples or equivalen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
1
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 37 publications
2
1
0
Order By: Relevance
“…We develop a class of feasible policies that are shown to be asymptotically optimal within a large class of good policies that uniformly fast (UF) convergent, in the sense of Burnetas and Katehakis (1996) and Lai and Robbins (1985). The results in this paper extend the work in Burnetas et al (2017) which solved the case where there exists only one type of constraint for all bandits. Further, the class of block-UCB (b-UCB) feasible policies which are developed here and achieve the asymptotic lower bound in the regret have a simpler form and are easier to compute than those in Burnetas et al (2017).…”
Section: Introductionsupporting
confidence: 79%
See 2 more Smart Citations
“…We develop a class of feasible policies that are shown to be asymptotically optimal within a large class of good policies that uniformly fast (UF) convergent, in the sense of Burnetas and Katehakis (1996) and Lai and Robbins (1985). The results in this paper extend the work in Burnetas et al (2017) which solved the case where there exists only one type of constraint for all bandits. Further, the class of block-UCB (b-UCB) feasible policies which are developed here and achieve the asymptotic lower bound in the regret have a simpler form and are easier to compute than those in Burnetas et al (2017).…”
Section: Introductionsupporting
confidence: 79%
“…The results in this paper extend the work in Burnetas et al (2017) which solved the case where there exists only one type of constraint for all bandits. Further, the class of block-UCB (b-UCB) feasible policies which are developed here and achieve the asymptotic lower bound in the regret have a simpler form and are easier to compute than those in Burnetas et al (2017). We also refer to Burnetas and Kanavetas (2012) where a consistent policy (i.e., with regret o(n)) for the case of a single linear constraint was constructed.…”
Section: Introductionsupporting
confidence: 79%
See 1 more Smart Citation