2021
DOI: 10.3390/e23030380
|View full text |Cite
|
Sign up to set email alerts
|

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

Abstract: The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 42 publications
0
8
0
Order By: Relevance
“…The task structures did not allow to compute this value on the basis of changes in the stochasticity of the environments (cf. Cavenaghi et al, 2021 ), or thanks to the presence of large or small errors (cf. McGuire et al, 2014 ).…”
Section: Methodsmentioning
confidence: 99%
“…The task structures did not allow to compute this value on the basis of changes in the stochasticity of the environments (cf. Cavenaghi et al, 2021 ), or thanks to the presence of large or small errors (cf. McGuire et al, 2014 ).…”
Section: Methodsmentioning
confidence: 99%
“…This means that a simple sliding window could be applied in real-world settings to discard data older than one month, and keep the model up-to-date. We plan on exploring how state-of-the-art non-stationary bandit techniques fare on the various types of concept drift [3], including adaptive window size, that takes into account how fast the environment changes. • This work assumes that the reward is immediate, i.e.…”
Section: Conclusion and Next Stepsmentioning
confidence: 99%
“…As they are very relevant to many industry applications, contextual bandits have been widely studied, with many different algorithms proposed, see for example [4,25,26].…”
Section: Related Workmentioning
confidence: 99%