2021
DOI: 10.1007/978-3-030-86486-6_4
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting History Data for Nonstationary Multi-armed Bandit

Abstract: The Multi-armed Bandit (MAB) framework has been applied successfully in many application fields. In the last years, the use of active approaches to tackle the nonstationary MAB setting, i.e., algorithms capable of detecting changes in the environment and re-configuring automatically to the change, has been widening the areas of application of MAB techniques. However, such approaches have the drawback of not reusing information in those settings where the same environment conditions recur over time. This paper … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 18 publications
(35 reference statements)
0
2
0
Order By: Relevance
“…33 Seq(GP-UCB-CD), instead, employs an active change detection (CD) test which actively monitors for the presence of changes in the measured contaminant concentrations, providing alerts regarding pattern changes. 34 However, using this strategy, monitoring schemes are adapted only after the change has been detected. Both algorithms can select the sampling instant based on two different target value preferences.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…33 Seq(GP-UCB-CD), instead, employs an active change detection (CD) test which actively monitors for the presence of changes in the measured contaminant concentrations, providing alerts regarding pattern changes. 34 However, using this strategy, monitoring schemes are adapted only after the change has been detected. Both algorithms can select the sampling instant based on two different target value preferences.…”
Section: Methodsmentioning
confidence: 99%
“…, they have constant behaviour over time, recently, a new set of techniques for non-stationary MAB settings have been proposed and showed promising results in a wide range of applications in the Internet advertising and dynamic pricing fields, but not environmental monitoring. 31–34 This framework is usually described as a slot machine game with several arms characterized by different rewards, which in the non-stationary case might change as the game progresses. At the beginning of the game, the player will pull the arms randomly, not having any previous knowledge of the rewards, while, as the game progresses, they will focus on the most promising arm, pulling the others less frequently.…”
Section: Introductionmentioning
confidence: 99%