Good arm identification via bandit feedback

Kano, Hideaki; Honda, Junichi; Sakamaki, Kentaro; Matsuura, Kentaro; Nakamura, Atsuyoshi; Sugiyama, Masashi

doi:10.1007/s10994-019-05784-4

Cited by 18 publications

(15 citation statements)

References 13 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm using our stopping condition stopped drawing an arm about two times faster than the algorithm using the conventional stopping condition when its loss mean is around the center of the thresholds. Our algorithm with arm selection policy APT P always stopped faster than the algorithm using arm selection policy UCB (Auer and Cesa-Bianchi, 2002) like HDoC (Kano et al, 2017), and our algorithm's stopping time was faster or comparable to the stopping time of the algorithm using arm selection policy LUCB (Kalyanakrishnan et al, 2012) in our simulations using Bernoulli loss distribution with synthetically generated means and means generated from a real-world dataset.…”

mentioning

confidence: 58%

“…Remark 1 Identification is not needed for checking existence, however, in terms of asymptotic behavior as δ → +0, the shown expected sample complexity lower bounds of both the tasks are the same; lim δ→+0 E(T )/ ln(1/δ) ≥ 1/d(µ 1 , θ L ) for both the tasks in the case with some positive arms. The bounds are tight considering the shown upper bounds, so the bad arm existence checking is not more difficult than the good arm identification (Kano et al, 2017) with respect to asymptotic behavior as δ → +0.…”

Section: Preliminarymentioning

confidence: 95%

“…HDoC (Hybrid algorithm for the Dilemma of Confidence) (Kano et al, 2017) for good arm identification problem uses arm selection policy UCB (Upper Confidence Bound) (Auer and Cesa-Bianchi, 2002), in which…”

Section: Comparison With Baec[ucb µ µ]mentioning

confidence: 99%

“…compared to the case using the conventional stopping condition of the successive elimination algorithm (Even-Dar et al, 2006). Regarding the asymptotic behavior as δ → 0, the upper bound on the expected number of samples for our algorithm with arm selection policy APT P is proved to be almost optimal when all the positive arms have the same loss mean, which is the case that HDoC (Kano et al, 2017) does not perform well. Note that HDoC is an algorithm for good arm identification that uses UCB (Auer and Cesa-Bianchi, 2002) as the arm selection policy.…”

mentioning

confidence: 96%

“…Apart from whether fixed confidence (constraint on error probability to achieve) or fixed budget (constraint on the allowable number of draws), positive and negative arms are treated symmetrically in the thresholding bandit problem while they are dealt with asymmetrically in our problem setting; judgment of one positive arm existence is enough for positive conclusion though all the arms must be judged as negative for negative conclusion. This asymmetry has also been considered in the good arm identification problem (Kano et al, 2017), and our problem can be seen as its specialized version. In their setting, the player's task is to output all the arms of above-threshold means with probability at least 1 − δ, and his/her objective is to minimize the number of drawn samples until λ arms are outputted as arms with above-threshold means for a given λ.…”

mentioning

confidence: 99%

See 4 more Smart Citations

A bad arm existence checking problem: How to utilize asymmetric problem structure?

et al. 2019

Self Cite

View full text Add to dashboard Cite

We study a bad arm existing checking problem in which a player's task is to judge whether a positive arm exists or not among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold. This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this problem is the asymmetry of positive and negative (non-positive) arms' roles; finding one positive arm is enough to judge existence while all the arms must be discriminated as negative to judge non-existence. We propose an algorithms with arm selection policy (policy to determine the next arm to draw) and stopping condition (condition to stop drawing arms) utilizing this asymmetric problem structure and prove its effectiveness theoretically and empirically.

show abstract

mentioning

confidence: 58%

Section: Preliminarymentioning

confidence: 95%

Section: Comparison With Baec[ucb µ µ]mentioning

confidence: 99%

mentioning

confidence: 96%

mentioning

confidence: 99%

See 3 more Smart Citations