2019
DOI: 10.48550/arxiv.1908.10402
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

Abstract: We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…where the dependence on N is improved to In real-world applications, both L and T can be huge, for example, L and T are in the millions in web search, where the improvements are significant. Compared to recent works on piecewise-stationary MAB (Besson and Kaufmann, 2019) and combinatorial MAB (CMAB) (Zhou et al, 2019) that adopt GLRT as the change-point detector, the problem setting considered herein is different. In MAB, only one selected item rather than a list of items is allowed at each time.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…where the dependence on N is improved to In real-world applications, both L and T can be huge, for example, L and T are in the millions in web search, where the improvements are significant. Compared to recent works on piecewise-stationary MAB (Besson and Kaufmann, 2019) and combinatorial MAB (CMAB) (Zhou et al, 2019) that adopt GLRT as the change-point detector, the problem setting considered herein is different. In MAB, only one selected item rather than a list of items is allowed at each time.…”
Section: Discussionmentioning
confidence: 99%
“…Notice that although CMAB (Combes et al, 2015;Cesa-Bianchi and Lugosi, 2012;Chen et al, 2016) also allow a list of items each time, they have full feedback on all K items under semi-bandit setting. Furthermore, we develop the analysis of both UCB-based and KL-UCB based algorithms for CB, whereas only one of them (either UCB-based or KL-UCB based algorithm) is analyzed in Besson and Kaufmann (2019) and Zhou et al (2019). We also observe one interesting fact that the regret upper bounds of our proposed algorithms and minimax regret lower bounds match their counterparts in piecewise-stationary combinatorial semi-bandits (Zhou et al, 2019), in which the agent has access to the realizations of base arms in the played super arm.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations