2022
DOI: 10.1016/j.tics.2022.04.005
|View full text |Cite
|
Sign up to set email alerts
|

The computational roots of positivity and confirmation biases in reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

11
59
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 62 publications
(89 citation statements)
references
References 95 publications
11
59
0
Order By: Relevance
“…Moreover, we note that persistent distraction by the stochastic part of the environment observed in our participants (Fig. 5B3) is consistent with the known phenomena of optimism bias 85 and optimistic belief updating in humans 79,86,87 .…”
Section: Discussionsupporting
confidence: 90%
“…Moreover, we note that persistent distraction by the stochastic part of the environment observed in our participants (Fig. 5B3) is consistent with the known phenomena of optimism bias 85 and optimistic belief updating in humans 79,86,87 .…”
Section: Discussionsupporting
confidence: 90%
“…After the separate streams are merged through the update of each policy parameter q as , the greediness parameter b controls the soft-max choice probabilities to determine the exploration and exploitation balance (see STAR Methods). The imbalance in the learning rate, a; has often been discussed in different contexts from OCD (Lefebvre et al, 2017;Palminteri and Lebreton, 2022), and an excessively large greediness parameter, b, has been discussed in relation to persistence of compulsion. Hence, we confirmed whether OCD-like behavior emerges because of the imbalance in the learning rate, a; or an excessively large b in the same situation shown in Figures 1E and 1F (see STAR Methods).…”
Section: Theoretical Model Of Ocdmentioning
confidence: 99%
“…After the separate streams are merged through the update of each policy parameter q as , the greediness parameter β controls the soft-max choice probabilities to determine the exploration and exploitation balance (see STAR Methods). The imbalance in the learning rate, α , has often been discussed in different contexts from OCD (Lefebvre et al, 2017; Palminteri and Lebreton, 2022), and an excessively large greediness parameter, β , has been discussed in relation to persistence of compulsion. Hence, we confirmed whether OCD-like behavior emerges because of the imbalance in the learning rate, α , or an excessively large β in the same situation shown in Figures 1e and 1f (see STAR Methods).…”
Section: Resultsmentioning
confidence: 99%
“…Imbalance for positive and negative prediction errors has been often discussed about the learning rate 𝛼𝛼 ± (e.g. (Lefebvre et al, 2017;Palminteri and Lebreton, 2022)). Although we showed that the learning-rate imbalance alone cannot cause OCD-like behavior in the supposed state-transition environments (Figure 1g), it is valuable to clarify the relation between the learning-rate and tracefactor imbalances.…”
Section: Discussionmentioning
confidence: 99%