Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making

Liakoni, Vasiliki; Lehmann, Marco P.; Modirshanechi, Alireza; Brea, Johanni; Lutti, Antoine; Gerstner, Wulfram; Preuschoff, Kerstin

doi:10.1016/j.neuroimage.2021.118780

Cited by 5 publications

(12 citation statements)

References 119 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3A3). Moreover, the estimated parameters of our novelty-seeking algorithms confirms reliance of human participants on habit-formation instead of planning (Supplementary Materials), consistent with previous works on similar experimental paradigms 19,32 .…”

Section: Resultssupporting

confidence: 88%

“…2-Fig. 4: We use a hybrid RL model 19,32,51,52 combining model-based planning 36 and model-free habit-formation 53 and allow our algorithms to assign different extrinsic reward values to different goal states (Methods). Our model-comparison results show that seeking novelty is the most probable model for the majority (∼ 60%) of human participants, followed by seeking information-gain as the 2nd most probable model for ∼ 30% of human participants (Fig.…”

Section: Resultsmentioning

confidence: 99%

“…As a general choice for the RL algorithm in Fig. 1D, we consider a hybrid of model-based and model-free policy 19,32,52,53 . The model-free (MF) component uses the sequence of states s 1: t , actions a 1: t , extrinsic rewards r ext,1: t , and intrinsic rewards r int,1: t (in the two parallel branches in Fig.…”

Section: Methodsmentioning

confidence: 99%

“…Choosing new = 0 is equivalent to assuming there is no unknown states in the environment, for which the estimate in Eq. 6 is reduced to the classic Bayesian estimate of transition probabilities in bounded discrete environments 19,32 . The transition probabilities are then used in a novel variant of prioritized sweeping [33][34][35] adapted to deal with an unknown number of states.…”

Section: Full Hybrid Modelmentioning

confidence: 99%

“…We first design an experimental paradigm for human participnats that allows us dissociate predictions of different intrinsically motivated RL algorithms. We employ a sequential decision-making paradigm [30][31][32] for navigation in an environment with 58 states plus three goal states (Fig. 1A-B).…”

Section: Experimental Paradigmmentioning

confidence: 99%

See 4 more Smart Citations

The curse of optimism: a persistent distraction by novelty

Modirshanechi

Lin

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Human curiosity has been interpreted as a drive for exploration and modeled by intrinsically motivated reinforcement learning algorithms. An unresolved challenge in machine learning is that these algorithms are prone to distraction by reward-independent stochastic stimuli. We ask whether humans exhibit the same distraction pattern in their behavior as the algorithms. To answer this question, we design a multi-step decision-making paradigm containing an unknown number of states in a stochastic part of the environment, combined with a reward-manipulation that varies optimism of participants regarding the availability of rewards. We show that (i) participants exhibit a persistent distraction by novelty in the stochastic part; (ii) reward optimism increases this distraction; and (iii) the persistent distraction is consistent with the predictions of algorithms driven by novelty but not with ‘optimal’ algorithms driven by information-gain. Our results suggest that humans use sup-optimal but computationally cheap curiosity-driven policies for exploration in complex environments.

show abstract

Section: Resultssupporting

confidence: 88%

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Full Hybrid Modelmentioning

confidence: 99%

Section: Experimental Paradigmmentioning

confidence: 99%

See 3 more Smart Citations

The curse of optimism: a persistent distraction by novelty

Modirshanechi

Lin

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

A taxonomy of surprise definitions

Modirshanechi

Brea

Gerstner

2022

Journal of Mathematical Psychology

Self Cite

View full text Add to dashboard Cite

Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans

Pool,

Pauli,

Cross

et al. 2023

Nat Commun

View full text Add to dashboard Cite

We aim to differentiate the brain regions involved in the learning and encoding of Pavlovian associations sensitive to changes in outcome value from those that are not sensitive to such changes by combining a learning task with outcome devaluation, eye-tracking, and functional magnetic resonance imaging in humans. Contrary to theoretical expectation, voxels correlating with reward prediction errors in the ventral striatum and subgenual cingulate appear to be sensitive to devaluation. Moreover, regions encoding state prediction errors appear to be devaluation insensitive. We can also distinguish regions encoding predictions about outcome taste identity from predictions about expected spatial location. Regions encoding predictions about taste identity seem devaluation sensitive while those encoding predictions about an outcome’s spatial location seem devaluation insensitive. These findings suggest the existence of multiple and distinct associative mechanisms in the brain and help identify putative neural correlates for the parallel expression of both devaluation sensitive and insensitive conditioned behaviors.

show abstract

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making

Cited by 5 publications

References 119 publications

The curse of optimism: a persistent distraction by novelty

The curse of optimism: a persistent distraction by novelty

A taxonomy of surprise definitions

Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans

Contact Info

Product

Resources

About