Learning to perform a behavioural procedure as a well-ingrained habit requires extensive repetition of the behavioural sequence, and learning not to perform such behaviours is notoriously difficult. Yet regaining a habit can occur quickly, with even one or a few exposures to cues previously triggering the behaviour. To identify neural mechanisms that might underlie such learning dynamics, we made long-term recordings from multiple neurons in the sensorimotor striatum, a basal ganglia structure implicated in habit formation, in rats successively trained on a reward-based procedural task, given extinction training and then given reacquisition training. The spike activity of striatal output neurons, nodal points in cortico-basal ganglia circuits, changed markedly across multiple dimensions during each of these phases of learning. First, new patterns of task-related ensemble firing successively formed, reversed and then re-emerged. Second, task-irrelevant firing was suppressed, then rebounded, and then was suppressed again. These changing spike activity patterns were highly correlated with changes in behavioural performance. We propose that these changes in task representation in cortico-basal ganglia circuits represent neural equivalents of the explore-exploit behaviour characteristic of habit learning.
Encoding time is universally required for learning and structuring motor and cognitive actions, but how the brain keeps track of time is still not understood. We searched for time representations in cortico-basal ganglia circuits by recording from thousands of neurons in the prefrontal cortex and striatum of macaque monkeys performing a routine visuomotor task. We found that a subset of neurons exhibited time-stamp encoding strikingly similar to that required by models of reinforcement-based learning: They responded with spike activity peaks that were distributed at different time delays after single task events. Moreover, the temporal evolution of the population activity allowed robust decoding of task time by perceptron models. We suggest that time information can emerge as a byproduct of event coding in cortico-basal ganglia circuits and can serve as a critical infrastructure for behavioral learning and performance.population encoding ͉ TD learning ͉ time-stamped representation T iming of movements on short time-scales, on the order of hundreds of milliseconds, is essential for everyday behavior such as walking up stairs and, famously, for the highly skilled movement control required by behaviors such as playing the piano. Distributed sets of brain regions, especially including cortico-basal ganglia circuits, have been implicated in temporal representation across intervals of time (1-5). How such representations are achieved is not known. Influential models have suggested schemes using time-stamp codes in which individual neurons having single peaked responses distributed across multiple delays to specific events (6) or schemes using neuronal populations codes (3-5, 7-11). These theories naturally link timing to learning, now recognized as a major function of cortico-basal ganglia circuits (12)(13)(14). Keeping track of time is critical for solving the ''credit assignment problem'' in reinforcement-based learning, because the time delay between an event and the reward that it leads to must be encoded (15-17). Time-stamp coding of events has been explicitly incorporated in temporal difference models of reinforcement learning in basal ganglia circuits (15-16). However, evidence of time-stamp coding has not been found in neural recordings (3, 18), and evidence for population coding is also still largely restricted to responses to particular trained intervals (19)(20)(21)(22).We reasoned that if there is a cortico-basal ganglia timing system that builds temporal representations, it should be possible to decode time from the activity of neurons recorded in the neocortex and striatum of animals performing a simple sensorimotor task. Moreover, time-stamp encoding might be more evident with tasks not involving interval training, because interval training could force population activity toward the trained intervals rather than broad coverage of short time (21). We therefore trained macaque monkeys in a visually guided sequential saccade task that had temporal structure but did not explicitly require precise timing of ...
The tilt aftereffect (TAE) is a visual illusion in which prolonged adaptation to an oriented stimulus causes shifts in subsequent perceived orientations. Historically, neural models of the TAE have explained it as the outcome of response suppression of neurons tuned to the adapting orientation. Recent physiological studies of neurons in primary visual cortex (V1) have confirmed that such response suppression exists. However, it was also found that the preferred orientations of neurons shift away from the adapting orientation. Here we show that adding this second factor to a population coding model of V1 improves the correspondence between neurophysiological data and TAE measurements. According to our model, the shifts in preferred orientation have the opposite effect as response suppression, reducing the magnitude of the TAE.
Whether general principles can explain the layouts of cortical maps remains unresolved. In primary visual cortex of ferret, the relationships between the maps of visual space and response features are predicted by a "dimension-reduction" model. The representation of visual space is anisotropic, with the elevation and azimuth axes having different magnification. This anisotropy is reflected in the orientation, ocular dominance, and spatial frequency domains, which are elongated such that their directions of rapid change, or high-gradient axes, are orthogonal to the high-gradient axis of the visual map. The feature maps are also strongly interdependent-their high-gradient regions avoid one another and intersect orthogonally where essential, so that overlap is minimized. Our results demonstrate a clear influence of the visual map on each feature map. In turn, the local representation of visual space is smooth, as predicted when many features are mapped within a cortical area.
Many animals produce vocal sequences that appear complex. Most researchers assume that these sequences are well characterized as Markov chains (i.e. that the probability of a particular vocal element can be calculated from the history of only a finite number of preceding elements). However, this assumption has never been explicitly tested. Furthermore, it is unclear how language could evolve in a single step from a Markovian origin, as is frequently assumed, as no intermediate forms have been found between animal communication and human language. Here, we assess whether animal taxa produce vocal sequences that are better described by Markov chains, or by non-Markovian dynamics such as the 'renewal process' (RP), characterized by a strong tendency to repeat elements. We examined vocal sequences of seven taxa: Bengalese finches Lonchura striata domestica, Carolina chickadees Poecile carolinensis, free-tailed bats Tadarida brasiliensis, rock hyraxes Procavia capensis, pilot whales Globicephala macrorhynchus, killer whales Orcinus orca and orangutans Pongo spp. The vocal systems of most of these species are more consistent with a non-Markovian RP than with the Markovian models traditionally assumed. Our data suggest that non-Markovian vocal sequences may be more common than Markov sequences, which must be taken into account when evaluating alternative hypotheses for the evolution of signalling complexity, and perhaps human language origins.
In the adult visual cortex, multiple feature maps exist and have characteristic spatial relationships with one another. The relationships can be reproduced by "dimension-reduction" computational models, suggesting that the principles of continuity and coverage may underlie cortical map organization. However, the mechanisms responsible for establishing these relationships are unknown. We explored whether removing one feature map during development causes a coordinated reorganization of the remaining maps or whether the remaining maps are unaffected. We removed the ocular dominance map by monocular enucleation in newborn ferrets, so that single eye stimulation drove the cortex in a more spatially uniform manner in adult monocular animals compared with normal animals. Maps of orientation, spatial frequency, and retinotopy formed in monocular ferrets, but their structures and spatial relationships differed from those in normal ferrets. The wavelength of the orientation map increased, so that the average orientation gradient across the cortex decreased. The decrease in the orientation gradient in monocular animals was most prominent in the high gradient regions of the spatial frequency map, indicating a coordinated reorganization between these two maps. In monocular animals, the orthogonal relationship between the orientation and spatial frequency maps was preserved, and the orthogonal relationship between the orientation and retinotopic maps became more pronounced. These results were consistent with detailed predictions of a dimension-reduction model of cortical organization. Thus, the number of feature maps in a cortical area influences the relationships between them, and inputs to the cortex have a significant role in generating these relationships.
Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively few available responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed naïve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.free-viewing | naïve monkey | reinforcement learning | saccade R einforcement learning (RL) theory formalizes the process by which rewards and punishments can shape the behaviors of a goal-seeking agent-person, animal, or robot-toward optimality (1). RL algorithms have been widely applied in neuroscience to characterize neural activity in animals and human subjects, most famously for the dopamine-containing systems of the brain and related brain regions (2-5). These ideas have also been influential in the study of habit learning, in which habits are typically thought to arise when behaviors, through repetition, eventually become reinforcement-independent, stimulus-response (S-R) associations that can be executed in a semiautomatic manner (6).In most learning experiments designed to test these ideas, a small range of relationships between actions and reward is imposed, cost-benefit ratios are explicit, and fixed and usually limited numbers of response choices are available, as for example when human subjects are asked to move a cursor in one direction to receive a monetary reward in a computer game, or when rodents are trained to press one or a small set of levers to receive a food reward. RL algorithms of va...
Variable motor sequences of animals are often structured and can be described by probabilistic transition rules between action elements. Examples include the songs of many songbird species such as the Bengalese finch, which consist of stereotypical syllables sequenced according to probabilistic rules (song syntax). The neural mechanisms behind such rules are poorly understood. Here, we investigate where the song syntax is encoded in the brain of the Bengalese finch by rapidly and reversibly manipulating the temperature in the song production pathway. Cooling the premotor nucleus HVC (proper name) slows down the song tempo, consistent with the idea that HVC controls moment-to-moment timings of acoustic features in the syllables. More importantly, cooling HVC alters the transition probabilities between syllables. Cooling HVC reduces the number of repetitions of long-repeated syllables and increases the randomness of syllable sequences. In contrast, cooling the downstream motor area RA (robust nucleus of the acropallium), which is critical for singing, does not affect the song syntax. Unilateral cooling of HVC shows that control of syllables is mostly lateralized to the left HVC, whereas transition probabilities between the syllables can be affected by cooling HVC in either hemisphere to varying degrees. These results show that HVC is a key site for encoding song syntax in the Bengalese finch. HVC is thus involved both in encoding timings within syllables and in sequencing probabilistic transitions between syllables. Our finding suggests that probabilistic selections and fine-grained timings of action elements can be integrated within the same neural circuits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.