2006
DOI: 10.1073/pnas.0505220103
|View full text |Cite
|
Sign up to set email alerts
|

Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity

Abstract: The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law. This behavior is remarkably conserved across species and experimental conditions, but its underlying neural mechanisms still are unknown. Here, we propose a neural explanation of this empirical law of behavior. We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
136
1
1

Year Published

2008
2008
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 97 publications
(138 citation statements)
references
References 31 publications
(35 reference statements)
0
136
1
1
Order By: Relevance
“…Importantly, our framework does not require specialized network structures, such as stimulus-locked tagged delay lines or phase-locked oscillators, used in many previous models (6,18). Our RDE model combines elements of reinforcement learning (19,20), reward modulated plasticity (21)(22)(23)(24) and models of recurrent network dynamics (25)(26)(27)(28). The framework is able to qualitatively account for the most prevalent class of rewardtiming sensitive neurons recorded by Shuler and Bear (9) and can serve, in principle, as a general model of how reward timing can be learned locally in different brain regions.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Importantly, our framework does not require specialized network structures, such as stimulus-locked tagged delay lines or phase-locked oscillators, used in many previous models (6,18). Our RDE model combines elements of reinforcement learning (19,20), reward modulated plasticity (21)(22)(23)(24) and models of recurrent network dynamics (25)(26)(27)(28). The framework is able to qualitatively account for the most prevalent class of rewardtiming sensitive neurons recorded by Shuler and Bear (9) and can serve, in principle, as a general model of how reward timing can be learned locally in different brain regions.…”
Section: Discussionmentioning
confidence: 99%
“…Demonstrations of reward dependent sustained activity in somatosensory cortex (11) and sustained responses in auditory cortex (8) might indicate that temporal and reward processing occur in lower order areas of the brain than previously thought. Additionally, because local neural populations throughout the cortex meet our model's minimal requirements, the fundamental concept of using an external signal to modulate plasticity (23,24) could be the basis of elementary mechanisms used throughout the brain to process time. Our RDE framework conceptualizes and formalizes how such networks can reliably learn temporal representations and leads to predictions that can be tested experimentally.…”
Section: Discussionmentioning
confidence: 99%
“…These findings have led to the idea that the traditional Hebbian view of synaptic plasticity driven by pre-and postsynaptic activity has to be augmented by reward prediction error as a third factor [80]. The current tasks of computational neuroscience are to compare the learning rules suggested by the top-down approach with experimental data, and generalize existing concepts in order to evaluate if and under which conditions 3-factor learning rules [82-84] of reward-modulated Hebbian synaptic plasticity can be useful at the macroscopic level of networks [85,86] and behavior [87,88].Reinforcement learning is a textbook example for a synergistic interaction of theory and simulation. Most algorithms are based on mathematical theory, but without an actual implementation and simulation, it is difficult to predict how they perform under realistic conditions.…”
mentioning
confidence: 99%
“…Operant behaviour is closely related to the release of DA in the central nervous system (CNS) and the matching paradigm (Herrnstein, 1997), developed within operant psychology, permits detailed experimental analysis of the effects of different rates of reinforcement on choice (Di Chiara, 2002a;Loewenstein and Seung, 2006). 'Matching' refers to the tendency of individual organisms to allocate responses among alternatives in proportion to the reinforcement obtained from each, and is a well-documented phenomenon of both nonhuman and human responding in experimental contexts (Davison and McCarthy, 1988).…”
Section: Rationality and Emotionmentioning
confidence: 99%