2019
DOI: 10.1007/s10994-019-05784-4
|View full text |Cite
|
Sign up to set email alerts
|

Good arm identification via bandit feedback

Abstract: We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 13 publications
(28 reference statements)
0
15
0
Order By: Relevance
“…The algorithm using our stopping condition stopped drawing an arm about two times faster than the algorithm using the conventional stopping condition when its loss mean is around the center of the thresholds. Our algorithm with arm selection policy APT P always stopped faster than the algorithm using arm selection policy UCB (Auer and Cesa-Bianchi, 2002) like HDoC (Kano et al, 2017), and our algorithm's stopping time was faster or comparable to the stopping time of the algorithm using arm selection policy LUCB (Kalyanakrishnan et al, 2012) in our simulations using Bernoulli loss distribution with synthetically generated means and means generated from a real-world dataset.…”
mentioning
confidence: 58%
See 4 more Smart Citations
“…The algorithm using our stopping condition stopped drawing an arm about two times faster than the algorithm using the conventional stopping condition when its loss mean is around the center of the thresholds. Our algorithm with arm selection policy APT P always stopped faster than the algorithm using arm selection policy UCB (Auer and Cesa-Bianchi, 2002) like HDoC (Kano et al, 2017), and our algorithm's stopping time was faster or comparable to the stopping time of the algorithm using arm selection policy LUCB (Kalyanakrishnan et al, 2012) in our simulations using Bernoulli loss distribution with synthetically generated means and means generated from a real-world dataset.…”
mentioning
confidence: 58%
“…Remark 1 Identification is not needed for checking existence, however, in terms of asymptotic behavior as δ → +0, the shown expected sample complexity lower bounds of both the tasks are the same; lim δ→+0 E(T )/ ln(1/δ) ≥ 1/d(µ 1 , θ L ) for both the tasks in the case with some positive arms. The bounds are tight considering the shown upper bounds, so the bad arm existence checking is not more difficult than the good arm identification (Kano et al, 2017) with respect to asymptotic behavior as δ → +0.…”
Section: Preliminarymentioning
confidence: 95%
See 3 more Smart Citations