Recent modeling of spike-timing-dependent plasticity indicates that plasticity involves as a third factor a local dendritic potential, besides pre- and postsynaptic firing times. We present a simple compartmental neuron model together with a non-Hebbian, biologically plausible learning rule for dendritic synapses where plasticity is modulated by these three factors. In functional terms, the rule seeks to minimize discrepancies between somatic firings and a local dendritic potential. Such prediction errors can arise in our model from stochastic fluctuations as well as from synaptic input, which directly targets the soma. Depending on the nature of this direct input, our plasticity rule subserves supervised or unsupervised learning. When a reward signal modulates the learning rate, reinforcement learning results. Hence a single plasticity rule supports diverse learning paradigms.
Population coding is widely regarded as a key mechanism for achieving reliable behavioral responses in the face of neuronal variability. But in standard reinforcement learning a flip-side becomes apparent. Learning slows down with increasing population size since the global reinforcement becomes less and less related to the performance of any single neuron. We show that, in contrast, learning speeds up with increasing population size if feedback about the population response modulates synaptic plasticity in addition to global reinforcement. The two feedback signals (reinforcement and population-response signal) can be encoded by ambient neurotransmitter concentrations which vary slowly, yielding a fully online plasticity rule where the learning of a stimulus is interleaved with the processing of the subsequent one. The assumption of a single additional feedback mechanism therefore reconciles biological plausibility with efficient learning.
Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties.
Mathematically, the calculation of elementary fluxes amounts to characterizing the space of solutions to a mixed system of linear equalities, given by the stoichiometry matrix, and linear inequalities, arising from the irreversibility of some or all of the reactions in the network. Previous approaches to this problem have iteratively solved for the equalities while satisfying the inequalities throughout the process. In an extension of previous work, here we consider the complementary approach and derive an algorithm which satisfies the inequalities one by one while staying in the space of solution of the equality constraints. Benchmarks on different subnetworks of the central carbon metabolism of Escherichia coli show that this new approach yields a significant reduction in the execution time of the calculation. This reduction arises since the odds that an intermediate elementary flux already fulfills an additional inequality are larger than when having to satisfy an additional equality constraint.
Hence, we propose to instead focus on the conversion cone, a projection of the flux cone, which describes the interaction of the metabolism with its external chemical environment. We present a direct method for calculating the elementary vectors of this cone and, by studying the metabolism of Saccharomyces cerevisiae, we demonstrate that such an analysis is computationally feasible even for genome scale networks.
In learning from trial and error, animals need to relate behavioral decisions to environmental reinforcement even though it may be difficult to assign credit to a particular decision when outcomes are uncertain or subject to delays. When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic releases. We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces. Each synaptic cascade correlates presynaptic input first with postsynaptic events, next with the behavioral decisions and finally with external reinforcement. For operant conditioning, learning succeeds even when reinforcement is delivered with a delay so large that temporal contiguity between decision and pertinent reward is lost due to intervening decisions which are themselves subject to delayed reinforcement. This shows that the model provides a viable mechanism for temporal credit assignment. Further, learning speeds up with increasing population size, so the plasticity cascade simultaneously addresses the spatial problem of assigning credit to synapses in different population neurons. Simulations on other tasks, such as sequential decision making, serve to contrast the performance of the proposed scheme to that of temporal difference-based learning. We argue that, due to their comparative robustness, synaptic plasticity cascades are attractive basic models of reinforcement learning in the brain.
The analysis of metabolic networks has become a major topic in biotechnology in recent years. Applications range from the enhanced production of selected outputs to the prediction of genotype-phenotype relationships. The concepts used are based on the assumption of a pseudo steady-state of the network, so that for each metabolite inputs and outputs are balanced. The stoichiometric network analysis expands the steady state into a combination of nonredundant subnetworks with positive coefficients called extremal currents. Based on the unidirectional representation of the system these subnetworks form a convex cone in the flux-space. A modification of this approach allowing for reversible reactions led to the definition of elementary modes. Extreme pathways are obtained with the same method but splitting up internal reactions into forward and backward rates. In this study, we explore the relationship between these concepts. Due to the combinatorial explosion of the number of elementary modes in large networks, we promote a further set of metabolic routes, which we call the minimal generating set. It is the smallest subset of elementary modes required to describe all steady states of the system. For large-scale networks, the size of this set is of several magnitudes smaller than that of elementary modes and of extreme pathways.
In the last decade dendrites of cortical neurons have been shown to nonlinearly combine synaptic inputs by evoking local dendritic spikes. It has been suggested that these nonlinearities raise the computational power of a single neuron, making it comparable to a 2-layer network of point neurons. But how these nonlinearities can be incorporated into the synaptic plasticity to optimally support learning remains unclear. We present a theoretically derived synaptic plasticity rule for supervised and reinforcement learning that depends on the timing of the presynaptic, the dendritic and the postsynaptic spikes. For supervised learning, the rule can be seen as a biological version of the classical error-backpropagation algorithm applied to the dendritic case. When modulated by a delayed reward signal, the same plasticity is shown to maximize the expected reward in reinforcement learning for various coding scenarios. Our framework makes specific experimental predictions and highlights the unique advantage of active dendrites for implementing powerful synaptic plasticity rules that have access to downstream information via backpropagation of action potentials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.