Learning in Volatile Environments With the Bayes Factor Surprise

Liakoni, Vasiliki; Modirshanechi, Alireza; Gerstner, Wulfram; Brea, Johanni

doi:10.1162/neco_a_01352

Cited by 21 publications

(76 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To adapt both model-based and model-free policies of the SurNoR algorithm, surprise is used in two different ways. First, high values of surprise systematically lead to a larger learning rate for the update of the world-model than smaller ones, consistent with earlier models [27,29]. Second, going beyond previous models of behavior [20,[24][25][26]30], surprise also influences the learning rate of the model-free reinforcement learning branch.…”

Section: Plos Computational Biologysupporting

confidence: 76%

“…Our key findings can be summarized in three points: (i) We find that novelty-seeking explains participants' exploratory behavior better than alternative exploration strategies such as seeking surprise or uncertainty [42,43]; (ii) we observe that participants use their worldmodel only rarely for action planning and mainly to extract moments of surprise; and importantly, (iii) we show that surprise calculated by the world-model does not only modulate the learning of the world-model [24][25][26]29] but also the learning of model-free action-values. In particular, we show that such a modulation is necessary to explain participants' adaptive behavior.…”

Section: Introductionmentioning

confidence: 93%

“…To quantify expectations, we assume that participants build an internal model of the environment ('world-model'), i.e., we hypothesize that participants estimate the probability p (t) (s t+1 |s t , a t ) of a transition from a given state s t to another state s t+1 when performing action a t . More precisely, we assume that the world-model counts transitions from state s to s 0 under action a using either a leaky [23,49,50] or a surprise-modulated [28,29,51] counting procedure, described by the pseudo-countC ðtÞ s;a!s 0 . The conditional probability is then p ðtÞ ðs tþ1 js t ; a t Þ ¼C ðtÞ s t ;a t !s tþ1 þ � C ðtÞ s t ;a t þ 11�…”

Section: Plos Computational Biologymentioning

confidence: 99%

“…Therefore, we consider the surprise of such a transition to be a decreasing function of p (t) (s t+1 |s t , a t ). More precisely, we use a recent measure of surprise motivated by a Bayesian framework for learning in volatile environments, called the 'Bayes Factor' surprise [29]. The Bayes Factor surprise of the transition from state s t to state s t+1 after taking action a t is S ðtþ1Þ BF ¼ const: p ðtÞ ðs tþ1 js t ; a t Þ ; ð4Þ…”

Section: Plos Computational Biologymentioning

confidence: 99%

“…Whereas the reward prediction error (RPE) is a mismatch between the expected reward and the actual reward, surprise is a mismatch between an expected observation and an actual observation. Behavioral experiments [20,[24][25][26][27] and theories [27][28][29][30] suggest that surprise helps humans to adapt their behavior quickly to changes in the environment, potentially by modulating synaptic plasticity [31][32][33].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

Modirshanechi

Lehmann

et al. 2021

PLoS Comput Biol

Self Cite

View full text Add to dashboard Cite

Classic reinforcement learning (RL) theories cannot explain human behavior in the absence of external reward or when the environment changes. Here, we employ a deep sequential decision-making paradigm with sparse reward and abrupt environmental changes. To explain the behavior of human participants in these environments, we show that RL theories need to include surprise and novelty, each with a distinct role. While novelty drives exploration before the first encounter of a reward, surprise increases the rate of learning of a world-model as well as of model-free action-values. Even though the world-model is available for model-based RL, we find that human decisions are dominated by model-free action choices. The world-model is only marginally used for planning, but it is important to detect surprising events. Our theory predicts human action choices with high probability and allows us to dissociate surprise, novelty, and reward in EEG signals.

show abstract