Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553524
|View full text |Cite
|
Sign up to set email alerts
|

Piecewise-stationary bandit problems with side observations

Abstract: We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
53
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 65 publications
(55 citation statements)
references
References 15 publications
2
53
0
Order By: Relevance
“…Non-stationary environments have not been extensively studied in the bandit literature. For unstructured problems, the performance of algorithms based on UCB [3] has been analyzed in [12,22,37] under the assumption that the average rewards are abruptely changing. Here we consider more realistic scenarios where the average rewards smoothly evolve over time.…”
Section: Stochastic Mab Problemsmentioning
confidence: 99%
“…Non-stationary environments have not been extensively studied in the bandit literature. For unstructured problems, the performance of algorithms based on UCB [3] has been analyzed in [12,22,37] under the assumption that the average rewards are abruptely changing. Here we consider more realistic scenarios where the average rewards smoothly evolve over time.…”
Section: Stochastic Mab Problemsmentioning
confidence: 99%
“…Most works in non-stationary k-armed bandit problems try to detect when the change in the distributions occurs, and then to re-learn with classical stationary approaches [45].…”
Section: The Prq-learning Algorithmmentioning
confidence: 99%
“…For example, the discounted/sliding-window UCB algorithm [25] assumes the nature of the non-stationarity is that the reward distribution is piece-wise and the number of changes is known. Similarly [27] makes the easier piecewise assumption, and also that the retrospective rewards for un-pulled arms are available -but they are not in active learning. In [28], the authors proposed to measure the total statistical variance of the consecutive distributions at each time interval.…”
Section: Existing Mab Ensembles Are Not Robust To Non-stationaritymentioning
confidence: 99%