A. Ross Otto scite author profile

Accounts of decision-making have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental advances suggest that this classic distinction between habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning, called model-free and model-based learning. Popular neurocomputational accounts of reward processing emphasize the involvement of the dopaminergic system in model-free learning and prefrontal, central executive-dependent control systems in model-based choice. Here we hypothesized that the hypothalamic-pituitaryadrenal (HPA) axis stress response-believed to have detrimental effects on prefrontal cortex function-should selectively attenuate model-based contributions to behavior. To test this, we paired an acute stressor with a sequential decision-making task that affords distinguishing the relative contributions of the two learning strategies. We assessed baseline working-memory (WM) capacity and used salivary cortisol levels to measure HPA axis stress response. We found that stress response attenuates the contribution of model-based, but not model-free, contributions to behavior. Moreover, stress-induced behavioral changes were modulated by individual WM capacity, such that low-WM-capacity individuals were more susceptible to detrimental stress effects than high-WMcapacity individuals. These results enrich existing accounts of the interplay between acute stress, working memory, and prefrontal function and suggest that executive function may be protective against the deleterious effects of acute stress.A number of accounts of human and animal decision-making posit the coexistence of separate valuation systems that control choice (1-4), which, broadly speaking, represent automatic or habitual vs. deliberative or controlled modes. The circumstances under which one system may dominate over the other and thereby exert control over behavior has been a question of interest in both neuroscience and psychology, in part because of the implications of such differential control for disorders of compulsion such as drug abuse (5, 6). Acute stress may afford unique leverage in isolating the properties of these systems, because it is believed to prompt a shift from more cognitive or deliberative processes to more automatic processes presumed to be underpinned by phylogenetically older brain structures (7).Accordingly, a spate of recent work suggests that acute stressindexed by changes in levels of cortisol, a neuroendocrine marker of stress response-engenders reliance on putative habitual and/ or automatic processes in human decision-making (8-13), consistent with the assumption that the physiological stress response impairs central executive functions subserving more deliberative choice. However, distinguishing such processes is both experimentally and theoretically fraught, because in dual process theories, which system controls a particular behavior is t...

show abstract

Model-based learning protects against forming habits

Gillan

Otto

Phelps

et al. 2015

Cogn Affect Behav Neurosci

262

232

View full text Add to dashboard Cite

Studies in humans and rodents have suggested that behavior can at times be “goal-directed”—that is, planned, and purposeful—and at times “habitual”—that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants’ subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.Electronic supplementary materialThe online version of this article (doi:10.3758/s13415-015-0347-6) contains supplementary material, which is available to authorized users.

show abstract

Retrospective revaluation in sequential decision making: A tale of two systems.

Gershman¹,

Markman²,

Otto³

2014

Journal of Experimental Psychology: General

204

210

View full text Add to dashboard Cite

Recent computational theories of decision making in humans and animals have portrayed 2 systems locked in a battle for control of behavior. One system--variously termed model-free or habitual--favors actions that have previously led to reward, whereas a second--called the model-based or goal-directed system--favors actions that causally lead to reward according to the agent's internal model of the environment. Some evidence suggests that control can be shifted between these systems using neural or behavioral manipulations, but other evidence suggests that the systems are more intertwined than a competitive account would imply. In 4 behavioral experiments, using a retrospective revaluation design and a cognitive load manipulation, we show that human decisions are more consistent with a cooperative architecture in which the model-free system controls behavior, whereas the model-based system trains the model-free system by replaying and simulating experience.

show abstract

The Curse of Planning

et al. 2013

View full text Add to dashboard Cite

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. Along these lines, a flexible but computationally expensive model-based reinforcement learning system has been contrasted with a less flexible but more efficient model-free reinforcement learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Based on the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrate that having human decision-makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement learning strategy. Further, we show that across trials, people negotiate this tradeoff dynamically as a function of concurrent executive function demands and their choice latencies reflect the computational expenses of the strategy employed. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.

show abstract

From Creatures of Habit to Goal-Directed Learners

et al. 2016

View full text Add to dashboard Cite

Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults, performed a sequential reinforcement-learning task that enables estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was evident in choice behavior across all age groups, evidence of a model-based strategy only emerged during adolescence and continued to increase into adulthood. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A. Ross Otto

Working-memory capacity protects model-based learning from stress

Model-based learning protects against forming habits

Retrospective revaluation in sequential decision making: A tale of two systems.

The Curse of Planning

From Creatures of Habit to Goal-Directed Learners

Contact Info

Product

Resources

About