2020
DOI: 10.1038/s41586-019-1924-6
|View full text |Cite
|
Sign up to set email alerts
|

A distributional code for value in dopamine-based reinforcement learning

Abstract: Since its introduction, the reward prediction error (RPE) theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain 1-3 . According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. In the present work, we propose a novel account of dopamine-based reinforcement learning. Inspired by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

24
421
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 314 publications
(451 citation statements)
references
References 33 publications
(3 reference statements)
24
421
1
Order By: Relevance
“…5K). This is in line with the recent idea of a distributed population code for dopamine neurons (Dabney et al, 2020).…”
Section: Resultssupporting
confidence: 90%
See 1 more Smart Citation
“…5K). This is in line with the recent idea of a distributed population code for dopamine neurons (Dabney et al, 2020).…”
Section: Resultssupporting
confidence: 90%
“…Dopamine neurons signal prediction errors by transient, sub-second changes in their firing rates. A recent study implied that prediction error coding in dopamine neurons might occur in a distributed manner, thereby creating a range from "optimistic" to "pessimistic" cellular coding schemes (Dabney et al, 2020). For many years, transient cue-or reward-associated increases in firing rate -from a low tonic background frequency (about 1-8 Hz) into the beta and gamma range (15-30 Hz and 40-100 Hz, respectively) -were thought to encode positive reward prediction errors (RPEs), while short rate reductions or pauses in baseline firing were believed to encode negative RPEs, induced for instance by the omission of an expected reward.…”
Section: Mainmentioning
confidence: 99%
“…These questions are of interest to both the neuroscience and the machine learning communities. To neuroscience, RNNs are neurally-plausible mechanistic models that can serve as a good comparison with animal behavior and neural data, as well as a source of scientific hypotheses [15,16,31,5,27,10,8]. To machine learning, we build on prior work reverse engineering how RNNs solve tasks [30,28,16,15,3,20,14,19], by studying a complicated task that nevertheless has exact Bayesian baselines for comparison, and by contributing task-agnostic analysis techniques.…”
Section: Introductionmentioning
confidence: 99%
“…Negative PEs in our study correspond to better-thanexpected outcomes. We note that many dopaminergic midbrain neurons encode better-than-expected outcomes in increased firing rates, and worse-than-expected outcomes in reduced firing rates, and this reduction is often less pronounced than the increase 33 , despite variability between individual neurons 29 .…”
Section: Discussionmentioning
confidence: 77%
“…As a second possible reason, biased PE encoding in individual neurons can, when integrated on the population level, afford probabilistic learning 29 . This study addressed variability of reward PE encoding bias in neurons within one region, but the same mechanism could also act across regions.…”
Section: Discussionmentioning
confidence: 99%