Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/317
|View full text |Cite
|
Sign up to set email alerts
|

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications

Abstract: We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 56 publications
(13 citation statements)
references
References 15 publications
0
13
0
Order By: Relevance
“…Moreover, even when Δ 2 is large, the sample complexity depends on the maximum over p k W 2 and 1−p k max(W,Δ k ) 2 , and hence W primarily determines the sample complexity, as can be seen in the order notation above. This also explains why we do better than the pure super-arm exploration algorithm COCI (Huang et al 2018) in experiments.…”
Section: Saucb Algorithmmentioning
confidence: 67%
See 4 more Smart Citations
“…Moreover, even when Δ 2 is large, the sample complexity depends on the maximum over p k W 2 and 1−p k max(W,Δ k ) 2 , and hence W primarily determines the sample complexity, as can be seen in the order notation above. This also explains why we do better than the pure super-arm exploration algorithm COCI (Huang et al 2018) in experiments.…”
Section: Saucb Algorithmmentioning
confidence: 67%
“…Unlike Hoeffding, the lil bound is time-uniform; that is, the lil bound holds for all timesteps (avoiding a naive union bound over time). While a number of other time-uniform concentration bounds exist in the literature (Huang et al 2018;Zhao et al 2016), in practice, the Hoeffding bound works much better for us than the lil bound (see experiments). Thus, we limit ourselves to just the Hoeffding bound and lil bound.…”
Section: Variantsmentioning
confidence: 92%
See 3 more Smart Citations