Where Does Value Come From?

Juechems, Keno; Summerfield, Christopher

doi:10.1016/j.tics.2019.07.012

Cited by 112 publications

(112 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the key conceptual difference of the DopAct framework is that it assumes that animals aim to achieve a desired level of reserves ( Buckley et al, 2017 ; Hull, 1952 ; Stephan et al, 2016 ), rather than always maximize acquiring resources. It has been proposed that when a physiological state is considered, the reward an animal aims to maximize can be defined as a reduction of distance between the current and desired levels of reserves ( Juechems and Summerfield, 2019 ; Keramati and Gutkin, 2014 ). Under this definition, a resource is equal to such subjective reward only if consuming it would not bring the animal beyond its optimal reserve level.…”

Section: Discussionmentioning

confidence: 99%

Dopamine role in learning and action inference

Bogacz

2020

eLife

View full text Add to dashboard Cite

This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between an available reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions.

show abstract

Section: Discussionmentioning

confidence: 99%

Dopamine role in learning and action inference

Bogacz

2020

eLife

View full text Add to dashboard Cite

show abstract

“…20 These feature weights could be encoded as part of a generic latent state representation, such as the one thought to be encoded in orbitofrontal cortex, 21,22 or in brain regions specific to representing physiological needs, such as the hypothalamus 23 or the insula. 24 Such a perspective can help resolve the "reward paradox", 25 a key challenge of applying RL as a theory of human and animal learning, which typically assumes an external reward function that does not exist in natural environments. This view predicts that inducing different motivational states (for example, hunger, thirst, sleepiness) would correspond to naturalistically varying the feature weights w.…”

Section: Discussionmentioning

confidence: 99%

Multi-Task Reinforcement Learning in Humans

Tomov

Schulz

Gershman

2019

Preprint

View full text Add to dashboard Cite

The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multi-task reinforcement learning. We study participants' behavior in a novel two-step decision making task with multiple features and changing reward functions. We compare their behavior to two state-of-the-art algorithms for multi-task reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered experiment, our results provide strong evidence for a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands. 12/16

show abstract

“…To tackle this, we incorporate an update mechanism that learns from both simulated and real experience to guide future search toward more promising regions of the hypothesis space (21). This is formally defined as a Gaussian mixture model policy over the three tools and their positions, π (s), which represents the model's belief about high-value actions for each tool.…”

Section: Ssup Modelmentioning

confidence: 99%

Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning

Allen

Smith

Tenenbaum

2020

Proc. Natl. Acad. Sci. U.S.A.

123

View full text Add to dashboard Cite

Many animals, and an increasing number of artificial agents, display sophisticated capabilities to perceive and manipulate objects. But human beings remain distinctive in their capacity for flexible, creative tool use—using objects in new ways to act on the world, achieve a goal, or solve a problem. To study this type of general physical problem solving, we introduce the Virtual Tools game. In this game, people solve a large range of challenging physical puzzles in just a handful of attempts. We propose that the flexibility of human physical problem solving rests on an ability to imagine the effects of hypothesized actions, while the efficiency of human search arises from rich action priors which are updated via observations of the world. We instantiate these components in the “sample, simulate, update” (SSUP) model and show that it captures human performance across 30 levels of the Virtual Tools game. More broadly, this model provides a mechanism for explaining how people condense general physical knowledge into actionable, task-specific plans to achieve flexible and efficient physical problem solving.

show abstract

Where Does Value Come From?

Cited by 112 publications

References 51 publications

Dopamine role in learning and action inference

Dopamine role in learning and action inference

Multi-Task Reinforcement Learning in Humans

Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning

Contact Info

Product

Resources

About