2021
DOI: 10.1162/neco_a_01352
|View full text |Cite
|
Sign up to set email alerts
|

Learning in Volatile Environments With the Bayes Factor Surprise

Abstract: Surprise-based learning allows agents to rapidly adapt to nonstationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call the Bayes Factor Surprise, that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
75
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(76 citation statements)
references
References 68 publications
1
75
0
Order By: Relevance
“…To adapt both model-based and model-free policies of the SurNoR algorithm, surprise is used in two different ways. First, high values of surprise systematically lead to a larger learning rate for the update of the world-model than smaller ones, consistent with earlier models [27,29]. Second, going beyond previous models of behavior [20,[24][25][26]30], surprise also influences the learning rate of the model-free reinforcement learning branch.…”
Section: Plos Computational Biologysupporting
confidence: 76%
See 4 more Smart Citations
“…To adapt both model-based and model-free policies of the SurNoR algorithm, surprise is used in two different ways. First, high values of surprise systematically lead to a larger learning rate for the update of the world-model than smaller ones, consistent with earlier models [27,29]. Second, going beyond previous models of behavior [20,[24][25][26]30], surprise also influences the learning rate of the model-free reinforcement learning branch.…”
Section: Plos Computational Biologysupporting
confidence: 76%
“…Our key findings can be summarized in three points: (i) We find that novelty-seeking explains participants' exploratory behavior better than alternative exploration strategies such as seeking surprise or uncertainty [42,43]; (ii) we observe that participants use their worldmodel only rarely for action planning and mainly to extract moments of surprise; and importantly, (iii) we show that surprise calculated by the world-model does not only modulate the learning of the world-model [24][25][26]29] but also the learning of model-free action-values. In particular, we show that such a modulation is necessary to explain participants' adaptive behavior.…”
Section: Introductionmentioning
confidence: 93%
See 3 more Smart Citations