The 2012 International Joint Conference on Neural Networks (IJCNN) 2012
DOI: 10.1109/ijcnn.2012.6252834
|View full text |Cite
|
Sign up to set email alerts
|

Learning from positive and negative rewards in a spiking neural network model of basal ganglia

Abstract: Despite the vast amount of experimental findings on the role of the basal ganglia in reinforcement learning, there is still general lack of network models that use spiking neurons and plausible plasticity mechanisms to demonstrate network-level reward-based learning. In this work we extend a recent spiking actor-critic network model of the basal ganglia, aiming to create a minimal realistic model of learning from both positive and negative rewards. We hypothesize and implement in the model segregation of not o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 34 publications
0
11
0
Order By: Relevance
“…Moreover, our toolchain is particularly well suited for studying closed-loop scenarios, where the neural network receives stimuli from a complex environment and produces an output, which in turn causes the robotic agent to perform actions within that environment. For example, a robotic agent can be placed in a classic experimental set-up like a T-maze and the behavior of the robot adapted by a neurally implemented reinforcement learner (Potjans et al, 2011 ; Jitsev et al, 2012 ; Frémaux et al, 2013 ; Friedrich and Lengyel, 2016 ). Here, there is a clear advantage over studying such questions just using neural simulators, as the representation of an external environment as a collection of neural recorders and stimulators is complex, and difficult to either generalize or customize.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, our toolchain is particularly well suited for studying closed-loop scenarios, where the neural network receives stimuli from a complex environment and produces an output, which in turn causes the robotic agent to perform actions within that environment. For example, a robotic agent can be placed in a classic experimental set-up like a T-maze and the behavior of the robot adapted by a neurally implemented reinforcement learner (Potjans et al, 2011 ; Jitsev et al, 2012 ; Frémaux et al, 2013 ; Friedrich and Lengyel, 2016 ). Here, there is a clear advantage over studying such questions just using neural simulators, as the representation of an external environment as a collection of neural recorders and stimulators is complex, and difficult to either generalize or customize.…”
Section: Discussionmentioning
confidence: 99%
“…This has the disadvantage that virtual experiments are complex and time consuming to develop and adapt. More importantly, tasks defined in this way are rather artificial (Potjans et al, 2011 ; Jitsev et al, 2012 ; Frémaux et al, 2013 ; Legenstein and Maass, 2014 ; Friedrich and Lengyel, 2016 ). Whereas there is certainly value in investigating very simplified tasks and sensory representations, it is also vital to be able to check that proposed neural architectures are capable of handling richer, noisier, and more complex scenarios.…”
Section: Introductionmentioning
confidence: 99%
“…The rate of this population is taken as an approximation of the value of the current active cluster and projects to a population of 1000 Poissonian spiking neurons representing the reward prediction error (RPE), which in turn produce the dopaminergic concentration D ( t ) as described above. The instantaneous change of is implemented (as in Jordan et al 2017) as a double connection from the critic to the RPE where one connection is excitatory with a small delay of 1 ms and the second is inhibitory with a larger delay of 20 ms (Potjans et al, 2009; Jitsev et al, 2012). Note that no claim is made for the biological plausibility of this circuit; it is simply a minimal circuit model that generates an adequate reward prediction error to enable the investigation of the role of clustered structure in generating useful representations for reinforcement learning tasks.…”
Section: Methodsmentioning
confidence: 99%
“…Modelling studies have employed a variety of strategies to allow the system to internally represent the relevant input features (also referred to as environmental states, in the context of reinforcement learning). This can be done, for example, by manually selecting neuronal receptive fields (Potjans et al, 2009, 2011; Jitsev et al, 2012; Frémaux et al, 2013) according to a pre-specified partition of the environment or by spreading the receptive fields uniformly, in order to cover the entire input space (Frémaux et al, 2013; Jordan et al, 2017). These example solutions have major conceptual drawbacks.…”
Section: Introductionmentioning
confidence: 99%
“…Computational models based on the Actor-Critic framework and using TD learning have tried to reproduce the functional and architectural features of BG (for reviews: Gillies and Arbuthnott, 2000 ; Joel et al, 2002 ; Doya, 2007 ; Cohen and Frank, 2009 ; Samson et al, 2010 ; Schroll and Hamker, 2013 ). Additionally, most of the computational models of the BG have either focused on biological plausibility (Lindahl et al, 2013 ; Gurney et al, 2015 ) or functional reproduction of the behavior during learning or action selection (Limousin et al, 1995 ; Gurney et al, 2001 ; Frank, 2006 ; O'Reilly and Frank, 2006 ; Ito and Doya, 2009 ; Potjans et al, 2009 ; Stocco et al, 2010 ; Jitsev et al, 2012 ; Stewart et al, 2012 ; Collins and Frank, 2014 ). As a result, there has been limited focus directed toward implementing functional spike-based models, specifically those that can also simulate dopamine depletion (but see Potjans et al, 2011 ).…”
Section: Introductionmentioning
confidence: 99%