2020
DOI: 10.1101/2020.09.24.311084
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Novelty is not Surprise: Human exploratory and adaptive behavior in sequential decision-making

Abstract: Drivers of reinforcement learning (RL), beyond reward, are controversially debated. Novelty and surprise are often used equivocally in this debate. Here, using a deep sequential decision-making paradigm, we show that reward, novelty, and surprise play different roles in human RL. Surprise controls the rate of learning, whereas novelty and the novelty prediction error (NPE) drive exploration. Exploitation is dominated by model-free (habitual) action choices. A theory that takes these separate effects into accou… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
39
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(41 citation statements)
references
References 65 publications
2
39
0
Order By: Relevance
“…The latter approach provides higher flexibility to arbitrate between different policies and is hence more compatible with the rapid change of behavior observed in our human participants that is triggered by the reward value they find in episode 1 (Fig. 5B1-2 and Supplementary Materials); this approach is also more consistent with experimental evidence for partially separate neural pathways of noveltyand reward-induced behaviors 19,65,7275 . Moreover, our 3rd observation shows that the relative importance of different policies is regulated by the degree of reward optimism – in line with the known influence of environmental variables on the preference for novelty 41,42,74 and curiosity-driven behavior 76,77 .…”
Section: Discussionsupporting
confidence: 83%
“…The latter approach provides higher flexibility to arbitrate between different policies and is hence more compatible with the rapid change of behavior observed in our human participants that is triggered by the reward value they find in episode 1 (Fig. 5B1-2 and Supplementary Materials); this approach is also more consistent with experimental evidence for partially separate neural pathways of noveltyand reward-induced behaviors 19,65,7275 . Moreover, our 3rd observation shows that the relative importance of different policies is regulated by the degree of reward optimism – in line with the known influence of environmental variables on the preference for novelty 41,42,74 and curiosity-driven behavior 76,77 .…”
Section: Discussionsupporting
confidence: 83%
“…Behavioral responses to novelty have been modeled in different ways across fields. Within the field of reinforcement learning, novelty is often thought of as either a rewarding outcome or a predictor of a potential reward, thereby prompting exploration before the first rewards are received (Kakade and Dayan, 2002; Xu et al, 2021). In this way, it can be incorporated into existing reinforcement learning frameworks.…”
Section: Introductionmentioning
confidence: 99%
“…Recent studies emphasized the computational difference between stimulus novelty and contextual novelty: the former refers to the quality of not being previously experienced or encountered, and the latter refers to the “surprise” when what is experienced does not match with what was expected in time and/or context (i.e. prediction error) (Barto et al, 2013; Kumaran and Maguire, 2007; Ranganath and Rainer, 2003; Xu et al, 2021). Finally, novelty avoidance or neophobia is typically characterized as mal-adaptative behavior but not a sensible action.…”
Section: Introductionmentioning
confidence: 99%
“…From a theoretical point of view, the roles of surprise and novelty in the learning process have been studied [Faraji et al, 2018, Xu et al, 2020. Xu et al [2020] found that novelty and surprise play different roles in the learning process: novelty drives exploration while surprise modulates the learning rate. To investigate the impact of surprise, authors have proposed behavioral experiments where the feedback is corrupted, and thus unreliable [Varrier et al, 2019].…”
Section: Discussionmentioning
confidence: 99%