“…After the separate streams are merged through the update of each policy parameter q as , the greediness parameter β controls the soft-max choice probabilities to determine the exploration and exploitation balance (see STAR Methods). The imbalance in the learning rate, α , has often been discussed in different contexts from OCD (Lefebvre et al, 2017; Palminteri and Lebreton, 2022), and an excessively large greediness parameter, β , has been discussed in relation to persistence of compulsion. Hence, we confirmed whether OCD-like behavior emerges because of the imbalance in the learning rate, α , or an excessively large β in the same situation shown in Figures 1e and 1f (see STAR Methods).…”