Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

Cavenaghi, Emanuele; Sottocornola, Gabriele; Stella, Fabio; Zanker, Markus

doi:10.3390/e23030380

Cited by 22 publications

(8 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The task structures did not allow to compute this value on the basis of changes in the stochasticity of the environments (cf. Cavenaghi et al, 2021 ), or thanks to the presence of large or small errors (cf. McGuire et al, 2014 ).…”

Section: Methodsmentioning

confidence: 99%

Similar network compositions, but distinct neural dynamics underlying belief updating in environments with and without explicit outcomes

Fiore

2022

NeuroImage

View full text Add to dashboard Cite

Classic decision theories typically assume the presence of explicit value-based outcomes after action selections to update beliefs about action-outcome contingencies. However, ecological environments are often opaque, and it remains unclear whether the neural dynamics underlying belief updating vary under conditions characterized by the presence or absence of such explicit value-based information, after each choice selection. We investigated this question in healthy humans ( n = 28) using Bayesian inference and two multi-option fMRI tasks: a multi-armed bandit task, and a probabilistic perceptual task, respectively with and without explicit value-based feedback after choice selections. Model-based fMRI analysis revealed a network encoding belief updating which did not change depending on the task. More precisely, we found a confidence-building network that included anterior hippocampus, amygdala, and medial prefrontal cortex (mPFC), which became more active as beliefs about action-outcome probabilities were confirmed by newly acquired information. Despite these consistent responses across tasks, dynamic causal modeling estimated that the network dynamics changed depending on the presence or absence of trial-by-trial value-based outcomes. In the task deprived of immediate feedback, the hippocampus increased its influence towards both amygdala and mPFC, in association with increased strength in the confidence signal. However, the opposite causal relations were found (i.e., from both mPFC and amygdala towards the hippocampus), in presence of immediate outcomes. This finding revealed an asymmetric relationship between decision confidence computations, which were based on similar computational models across tasks, and neural implementation, which varied depending on the availability of outcomes after choice selections.

show abstract

Section: Methodsmentioning

confidence: 99%

Similar network compositions, but distinct neural dynamics underlying belief updating in environments with and without explicit outcomes

Fiore

2022

NeuroImage

View full text Add to dashboard Cite

show abstract

“…This means that a simple sliding window could be applied in real-world settings to discard data older than one month, and keep the model up-to-date. We plan on exploring how state-of-the-art non-stationary bandit techniques fare on the various types of concept drift [3], including adaptive window size, that takes into account how fast the environment changes. • This work assumes that the reward is immediate, i.e.…”

Section: Conclusion and Next Stepsmentioning

confidence: 99%

Multi-armed Bandits for Performance Marketing

Gigli

Stella

2023

Preprint

View full text Add to dashboard Cite

This paper deals with the problem of optimising bids and budgets of a digital advertising portfolio. We improve on the current state of the art by introducing support for multi-ad group marketing campaigns and developing a highly data efficient parametric contextual bandit. The bandit, which exploits domain knowledge to reduce the exploration space, is shown to be effective under the following settings; few clicks and/or small conversion rate, short horizon scenarios, rapidly changing markets and low budget. Furthermore, a Bootstrapped Thompson Sampling algorithm is adapted to fit the parametric bandit. Extensive numerical experiments, performed on synthetic and real world data, show that, on average, the parametric bandit gains more conversions than state of the art bandits. Gains in performance are particularly high when an optimisation algorithm is needed the most, i.e. with tight budget or many ad groups, though gains are present also in the case of a single ad group.

show abstract

“…As they are very relevant to many industry applications, contextual bandits have been widely studied, with many different algorithms proposed, see for example [4,25,26].…”

Section: Related Workmentioning

confidence: 99%

Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models

Elwood

Leonardi

Kasem

et al. 2023

Entropy

View full text Add to dashboard Cite

Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration–exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform standard baseline algorithms, such as NN HMC, NN Discrete, Upper Confidence Bound, and Thompson Sampling, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.

show abstract

Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm

Cited by 22 publications

References 42 publications

Similar network compositions, but distinct neural dynamics underlying belief updating in environments with and without explicit outcomes

Similar network compositions, but distinct neural dynamics underlying belief updating in environments with and without explicit outcomes

Multi-armed Bandits for Performance Marketing

Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models

Contact Info

Product

Resources

About