Why do we repeat choices that we know are bad for us? Decision making is characterized by the parallel engagement of two distinct systems, goal-directed and habitual, thought to arise from two computational learning mechanisms, model-based and model-free. The habitual system is a candidate source of pathological fixedness. Using a decision task that measures the contribution to learning of either mechanism, we show a bias towards model-free (habit) acquisition in disorders involving both natural (binge eating) and artificial (methamphetamine) rewards, and obsessive-compulsive disorder. This favoring of model-free learning may underlie the repetitive behaviors that ultimately dominate in these disorders. Further, we show that the habit formation bias is associated with lower gray matter volumes in caudate and medial orbitofrontal cortex. Our findings suggest that the dysfunction in a common neurocomputational mechanism may underlie diverse disorders involving compulsion.
To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal “replay” of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task where an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices, vs. the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving.
Dopaminergic (DA) neurons in the midbrain provide rich, topographic innervation of the striatum and are central to learning and to generating actions. Despite the importance of this DA innervation, it remains unclear if and how DA neurons are specialized based on the location of their striatal target. Thus, we sought to compare the function of subpopulations of DA neurons that target distinct striatal subregions in the context of an instrumental reversal learning task. We identified key differences in the encoding of reward and choice in dopamine terminals in dorsal versus ventral striatum: DA terminals in ventral striatum responded more strongly to reward consumption and reward-predicting cues, whereas DA terminals in dorsomedial striatum responded more strongly to contralateral choices. In both cases the terminals encoded a reward prediction error. Our results suggest that the DA modulation of the striatum is spatially organized to support the specialized function of the targeted subregion.
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation.
Changing one’s mind on the basis of new evidence is a hallmark of cognitive flexibility. To revise our confidence in a previous decision, new evidence should be used to update beliefs about choice accuracy, but how this process unfolds in the human brain remains unknown. Here we manipulated whether additional sensory evidence supports or negates a previous motion direction discrimination judgment while recording markers of neural activity in the human brain using fMRI. A signature of post-decision evidence (change in log-odds correct) was selectively observed in the activity of posterior medial frontal cortex (pMFC). In contrast, distinct activity profiles in anterior prefrontal cortex (aPFC) mediated the impact of post-decision evidence on subjective confidence, independently of changes in decision value. Together our findings reveal candidate neural mediators of post-decisional changes of mind in the human brain, and indicate possible targets for ameliorating deficits in cognitive flexibility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.