The computational roots of positivity and confirmation biases in reinforcement learning

Palminteri, Stefano; Lebreton, Maël

doi:10.1016/j.tics.2022.04.005

Cited by 62 publications

(89 citation statements)

References 95 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, we note that persistent distraction by the stochastic part of the environment observed in our participants (Fig. 5B3) is consistent with the known phenomena of optimism bias 85 and optimistic belief updating in humans 79,86,87 .…”

Section: Discussionsupporting

confidence: 90%

The curse of optimism: a persistent distraction by novelty

Modirshanechi

Lin

et al. 2022

Preprint

View full text Add to dashboard Cite

Human curiosity has been interpreted as a drive for exploration and modeled by intrinsically motivated reinforcement learning algorithms. An unresolved challenge in machine learning is that these algorithms are prone to distraction by reward-independent stochastic stimuli. We ask whether humans exhibit the same distraction pattern in their behavior as the algorithms. To answer this question, we design a multi-step decision-making paradigm containing an unknown number of states in a stochastic part of the environment, combined with a reward-manipulation that varies optimism of participants regarding the availability of rewards. We show that (i) participants exhibit a persistent distraction by novelty in the stochastic part; (ii) reward optimism increases this distraction; and (iii) the persistent distraction is consistent with the predictions of algorithms driven by novelty but not with ‘optimal’ algorithms driven by information-gain. Our results suggest that humans use sup-optimal but computationally cheap curiosity-driven policies for exploration in complex environments.

show abstract

Section: Discussionsupporting

confidence: 90%

The curse of optimism: a persistent distraction by novelty

Modirshanechi

Lin

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…After the separate streams are merged through the update of each policy parameter q as , the greediness parameter b controls the soft-max choice probabilities to determine the exploration and exploitation balance (see STAR Methods). The imbalance in the learning rate, a; has often been discussed in different contexts from OCD (Lefebvre et al, 2017;Palminteri and Lebreton, 2022), and an excessively large greediness parameter, b, has been discussed in relation to persistence of compulsion. Hence, we confirmed whether OCD-like behavior emerges because of the imbalance in the learning rate, a; or an excessively large b in the same situation shown in Figures 1E and 1F (see STAR Methods).…”

Section: Theoretical Model Of Ocdmentioning

confidence: 99%

Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior

et al. 2022

View full text Add to dashboard Cite

“…After the separate streams are merged through the update of each policy parameter q as , the greediness parameter β controls the soft-max choice probabilities to determine the exploration and exploitation balance (see STAR Methods). The imbalance in the learning rate, α , has often been discussed in different contexts from OCD (Lefebvre et al, 2017; Palminteri and Lebreton, 2022), and an excessively large greediness parameter, β , has been discussed in relation to persistence of compulsion. Hence, we confirmed whether OCD-like behavior emerges because of the imbalance in the learning rate, α , or an excessively large β in the same situation shown in Figures 1e and 1f (see STAR Methods).…”

Section: Resultsmentioning

confidence: 99%

“…Imbalance for positive and negative prediction errors has been often discussed about the learning rate 𝛼𝛼 ± (e.g. (Lefebvre et al, 2017;Palminteri and Lebreton, 2022)). Although we showed that the learning-rate imbalance alone cannot cause OCD-like behavior in the supposed state-transition environments (Figure 1g), it is valuable to clarify the relation between the learning-rate and tracefactor imbalances.…”

Section: Discussionmentioning

confidence: 99%

Memory Trace Imbalance in Reinforcement and Punishment Systems Can Reinforce Implicit Choices Leading to Obsessive-Compulsive Behavior

Sakai

Abe

Narumoto

et al. 2020

Preprint

View full text Add to dashboard Cite

Nobody wants to experience anxiety. However, anxiety may be induced by our own implicit choices that are mis-reinforced by some imbalance in reinforcement learning. Here we focused on obsessive-compulsive disorder (OCD) as a candidate for implicitly learned anxiety. Simulations in the reinforcement learning framework showed that agents implicitly learn to become anxious when the memory trace signal for past actions decays differently for positive and negative prediction errors. In empirical data, we confirmed that OCD patients showed extremely imbalanced traces, which were normalized by serotonin enhancers. We also used fMRI to identify the neural signature of OCD and healthy participants with imbalanced traces. Beyond the spectrum of clinical phenotypes, these behavioral and neural characteristics can be generalized to variations in the healthy population.

show abstract

The computational roots of positivity and confirmation biases in reinforcement learning

Cited by 62 publications

References 95 publications

The curse of optimism: a persistent distraction by novelty

The curse of optimism: a persistent distraction by novelty

Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior

Memory Trace Imbalance in Reinforcement and Punishment Systems Can Reinforce Implicit Choices Leading to Obsessive-Compulsive Behavior

Contact Info

Product

Resources

About