Learning to make adaptive decisions depends on exploring options, experiencing their consequence, and reassessing one's strategy for the future. Although several studies have analyzed various aspects of value-based decision-making, most of them have focused on decisions in which gratification is cued and immediate. By contrast, how the brain gauges delayed consequence for decision-making remains poorly understood. To investigate this, we designed a novel decision-making task in which each decision altered future options to decide upon. The task was organized in groups of inter-dependent trials, and the participants were instructed to maximize cumulative reward value within each group. In the absence of any explicit performance feedback, the participants had to test and internally assess specific criteria to make decisions. The absence of explicit feedback was key to specifically study how the assessment of consequence forms and influences decisions as learning progresses. We formalized this operation mathematically by means of a multi-layered decision-making model. It uses a mean-field approximation to describe the dynamics of two populations of neurons which characterize the binary decision-making process. The resulting decision-making policy is dynamically modulated by an internal oversight mechanism based on the prediction of consequence. This policy is reinforced by rewarding outcomes. The model was validated by fitting each individual participants' behavior. It faithfully predicted non-trivial patterns of decision-making, regardless of performance level. These findings provide an explanation to how delayed consequence may be computed and incorporated into the neural dynamics of decision-making, and to how adaptation occurs in the absence of explicit feedback.
We present a new AdEx mean-field framework to model two networks of excitatory and inhibitory neurons, representing two cortical columns, and interconnected with excitatory connections contacting both Regularly Spiking (excitatory) and Fast Spiking (inhibitory) cells. This connection scheme is biophysically plausible since it is based on intercolumnar excitation and intracolumnar excitation-inhibition. This configuration introduces bicolumnar competition, sufficient for choosing between two alternatives. Each column represents a pool of neurons voting for one of two choices indicated by two stimuli presented on a monitor in human and macaque experiments. The task also requires maximizing the cumulative reward over each episode, which consists of a certain number of trials. The cumulative reward depends on the coherency between choices of the participant/model and preset strategy in the experiment. We endow the model with a reward-driven learning mechanism allowing to capture the implemented strategy, as well as to model individual exploratory behavior. We compare the simulation results to the behavioral data obtained from the human and macaque experiments in terms of performance and reaction time. This model provides a biophysical ground for simpler phenomenological models proposed for similar decision-making tasks and can be applied to neurophysiological data obtained from the macaque brain. Finally, it can be embedded in whole-brain simulators, such as The Virtual Brain (TVB), to study decision-making in terms of large scale brain dynamics.
Non-human primate (NHP) movement kinematics have been decoded from spikes and local field potentials (LFPs) recorded during motor tasks. However, the potential of LFPs to provide network-like characterizations of neural dynamics during planning and execution of sequential movements requires further exploration. Is the aggregate nature of LFPs suitable to construct informative brain state descriptors of movement preparation and execution? To investigate this, we developed a framework to process LFPs based on machine learning classifiers and analyzed LFP from a primate, implanted with several microelectrode arrays covering the premotor cortex in both hemispheres and the primary motor cortex on one side. The primate performed a reach and grasp task, consisting of five consecutive states, starting from rest until a rewarding target (food) was attained. We use this five-state task to characterize brain activity and connectivity within eight frequency bands, using spectral amplitude and pair-wise correlations across electrodes as features. Our results show that we could best distinguish all five movement-related states using the highest frequency band (200-500Hz), yielding an 87% accuracy with spectral amplitude, and 60% with pair-wise electrode correlation. Further analyses characterized each movement-related state, showing differential neuronal population activity at above-gamma frequencies during the various stages of movement. In summary, our results show that the concerted use of novel machine-learning techniques with coarse grained queue broad signals such as LFPs may be successfully used to track and decode fine reach and grasp movement aspects across several brain regions.
Learning to make decisions depends on exploring options, experiencing their consequence, and reassessing the strategy. Several studies have analyzed various aspects of value-based decision-making, focusing on cued and immediate gratification. By contrast, how the brain gauges delayed consequence for decision-making remains poorly understood. We designed a decision-making task in which decisions altered future options. In the absence of any explicit performance feedback, participants had to test and internally assess specific criteria to make optimal decisions. This task was designed to specifically study how the assessment of consequence forms and influences decisions as learning progresses. We analyzed behavior results to characterize individual differences in reaction times, decision strategies, and learning rates. We formalized this operation mathematically by means of a multi-layered decision-making model. The first layer described the dynamics of two populations of neurons characterizing the decision-making process. The other two layers modulated the decision-making policy by dynamically adapting an oversight learning mechanism. The model was validated by fitting individual participants’ behavior and it faithfully predicted non-trivial patterns of decision-making. These findings provided an explanation to how delayed consequence may be computed and incorporated into the neural dynamics of decision-making, and to how learning occurs in the absence of explicit feedback.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.