Positive reinforcement helps to control the acquisition of learned behaviours. Here we report a cellular mechanism in the brain that may underlie the behavioural effects of positive reinforcement. We used intracranial self-stimulation (ICSS) as a model of reinforcement learning, in which each rat learns to press a lever that applies reinforcing electrical stimulation to its own substantia nigra. The outputs from neurons of the substantia nigra terminate on neurons in the striatum in close proximity to inputs from the cerebral cortex on the same striatal neurons. We measured the effect of substantia nigra stimulation on these inputs from the cortex to striatal neurons and also on how quickly the rats learned to press the lever. We found that stimulation of the substantia nigra (with the optimal parameters for lever-pressing behaviour) induced potentiation of synapses between the cortex and the striatum, which required activation of dopamine receptors. The degree of potentiation within ten minutes of the ICSS trains was correlated with the time taken by the rats to learn ICSS behaviour. We propose that stimulation of the substantia nigra when the lever is pressed induces a similar potentiation of cortical inputs to the striatum, positively reinforcing the learning of the behaviour by the rats.
Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings in conscious rats showed that DA cells retain responses to predicted reward after responses to conditioned cues have developed, at least early in training. This contrasts with previous TD models that predict a gradual stepwise shift in latency with responses to rewards lost before responses develop to the conditioned cue. By exploring the TD parameter space, we demonstrate that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter ) and low learning rate (␣). These physiological constraints for TD parameters suggest that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain. Such properties enable rapid but stable initiation of learning when the number of stimulus-reward pairings is limited, conferring significant adaptive advantages in real-world environments.
Motor thalamus (Mthal) is implicated in the control of movement because it is strategically located between motor areas of the cerebral cortex and motor-related subcortical structures, such as the cerebellum and basal ganglia (BG). The role of BG and cerebellum in motor control has been extensively studied but how Mthal processes inputs from these two networks is unclear. Specifically, there is considerable debate about the role of BG inputs on Mthal activity. This review summarizes anatomical and physiological knowledge of the Mthal and its afferents and reviews current theories of Mthal function by discussing the impact of cortical, BG and cerebellar inputs on Mthal activity. One view is that Mthal activity in BG and cerebellar-receiving territories is primarily “driven” by glutamatergic inputs from the cortex or cerebellum, respectively, whereas BG inputs are modulatory and do not strongly determine Mthal activity. This theory is steeped in the assumption that the Mthal processes information in the same way as sensory thalamus, through interactions of modulatory inputs with a single driver input. Another view, from BG models, is that BG exert primary control on the BG-receiving Mthal so it effectively relays information from BG to cortex. We propose a new “super-integrator” theory where each Mthal territory processes multiple driver or driver-like inputs (cortex and BG, cortex and cerebellum), which are the result of considerable integrative processing. Thus, BG and cerebellar Mthal territories assimilate motivational and proprioceptive motor information previously integrated in cortico-BG and cortico-cerebellar networks, respectively, to develop sophisticated motor signals that are transmitted in parallel pathways to cortical areas for optimal generation of motor programmes. Finally, we briefly review the pathophysiological changes that occur in the BG in parkinsonism and generate testable hypotheses about how these may affect processing of inputs in the Mthal.
Midbrain dopamine (DA) neurons respond to sensory cues that predict reward. We tested the hypothesis that projections from the pedunculopontine tegmental nucleus (PPTg) are involved in driving this DA cell activity. First, the activity of PPTg and DA neurons was compared in a cued-reward associative learning paradigm. The majority of PPTg neurons showed phasic responses to the onset of sensory cues, at significantly shorter latency than DA cells, consistent with a PPTg-to-DA transmission of information. However, unlike DA cells, PPTg responses were almost entirely independent of whether signals were associated with rewards. Second, DA neuron responses to the cues were recorded in free-moving rats during reversible inactivation of the PPTg by microinfusion of local anesthetic. The results showed clear suppression of conditioned sensory responses of DA neurons after PPTg inactivation that was not seen after saline infusion or in non-DA cells. We propose that the PPTg relays information about the precise timing of attended sensory events, which is integrated with information about reward context by DA neurons.
The striatum is the major input nucleus of the basal ganglia. It is thought to play a key role in learning on the basis of positive reinforcement and in action selection. One view of the striatum conceives it as comprising a reiterated matrix of processing units that perform common operations in different striatal regions, namely synaptic plasticity according to a three-factor rule, and lateral inhibition. These operations are required for reinforcement learning and selection of previously reinforced actions. Analysis of the behavioral effects of circumscribed lesions of the striatum, however, suggests regional specialization of learning and decision-making operations. We consider how a basic processing unit may be modified by regional variations in neurochemical parameters, for example, by the gradient in density of dopamine terminals from dorsal to ventral striatum. These variations suggest subtle differences between dorsolateral and ventromedial striatal regions in the temporal properties of dopamine signaling, which are superimposed on regional differences in connectivity. We propose that these variations make sense in relation to the temporal structure of activity in striatal inputs from different regions, and the requirements of different learning operations. Dorsolateral striatal (DLS) regions may be subject to brief, precisely timed pulses of dopamine, whereas ventromedial striatal regions integrate dopamine signals over a longer time course. These differences may be important for understanding regional variations in the contribution to reinforcement of habits, versus incentive processes that are sensitive to the value of expected rewards.
Striatal cholinergic interneurons, also known as tonically active neurons (TANs), acquire a pause in firing during learning of stimulusreward associations. This pause response to a sensory stimulus emerges after repeated pairing with a reward. The conditioned pause is dependent on dopamine from the substantia nigra, but its underlying cellular mechanism is unknown. Using in vivo intracellular recording, we found that both subthreshold and suprathreshold depolarizations in cholinergic interneurons induced a prolonged afterhyperpolarization (AHP) associated with a pause in their tonic firing. The AHP duration was dependent on the level of depolarization, whether elicited by intracellular current injection or by activation of excitatory inputs from the cortex. High-frequency stimulation of the substantia nigra induced potentiation of the cortically evoked excitation and increased the prolonged AHP after the stimulus. These findings from anesthetized animals suggest that a substantia nigra-induced AHP produces stimulus-associated firing pauses in cholinergic interneurons. This mechanism may underlie the acquisition of the pause response in TANs recorded from behaving animals during learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.