When learning the value of actions in volatile environments, humans often make seemingly irrational decisions which fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here, using reinforcement learning (RL) models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stems from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by BOLD responses to obtained rewards in the dorsal anterior cingulate cortex (dACC) and by phasic pupillary dilation -suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine (LC-NE) system. Together, these findings indicate that most of behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.
words)When learning the value of actions in volatile environments, humans often make seemingly irrational decisions which fail to maximize expected value. We reasoned that these 'non-greedy' decisions, instead of reflecting information seeking during choice, may be caused by computational noise in the learning of action values. Here, using reinforcement learning (RL) models of behavior and multimodal neurophysiological data, we show that the majority of non-greedy decisions stems from this learning noise. The trial-to-trial variability of sequential learning steps and their impact on behavior could be predicted both by BOLD responses to obtained rewards in the dorsal anterior cingulate cortex (dACC) and by phasic pupillary dilation -suggestive of neuromodulatory fluctuations driven by the locus coeruleus-norepinephrine (LC-NE) system. Together, these findings indicate that most of behavioral variability, rather than reflecting human exploration, is due to the limited computational precision of reward-guided learning.
Random noise in information processing systems is widely seen as detrimental to function. But despite the large trial-to-trial variability of neural activity and behavior, humans and other animals show a remarkable adaptability to unexpected adverse events occurring during task execution. This cognitive ability, described as constitutive of general intelligence, is missing from current artificial intelligence (AI) systems which feature exact (noise-free) computations. Here we show that implementing computation noise in recurrent neural networks boosts their cognitive resilience to a variety of adverse conditions entirely unseen during training, in a way that resembles human and animal cognition. In contrast to artificial agents with exact computations, noisy agents exhibit hallmarks of Bayesian inference acquired in a 'zero-shot' fashion -without prior experience with conditions that require these computations for maximizing rewards. We further demonstrate that these cognitive benefits result from free-standing regularization of activity patterns in noisy neural networks. Together, these findings suggest that intelligence may ride on computation noise to promote near-optimal decisionmaking in adverse conditions without any engineered cognitive sophistication.
A key challenge in neuroscience is understanding how neurons in hundreds of interconnected brain regions integrate sensory inputs with prior expectations to initiate movements. It has proven difficult to meet this challenge when different laboratories apply different analyses to different recordings in different regions during different behaviours. Here, we report a comprehensive set of recordings from 115 mice in 11 labs performing a decision-making task with sensory, motor, and cognitive components, obtained with 547 Neuropixels probe insertions covering 267 brain areas in the left forebrain and midbrain and the right hindbrain and cerebellum. We provide an initial appraisal of this brain-wide map, assessing how neural activity en- codes key task variables. Representations of visual stimuli appeared transiently in classical visual areas after stimulus onset and then spread to ramp-like activity in a collection of mid- and hindbrain regions that also encoded choices. Neural responses correlated with motor action almost everywhere in the brain. Responses to reward delivery and consumption versus reward omission were also widespread. Representations of objective prior expectations were weaker, found in sparse sets of neurons from restricted regions. This publicly available dataset represents an unprecedented resource for understanding how computations distributed across and within brain areas drive behaviour.
The neural representations of prior information about the state of the world are poorly understood. To investigate this issue, we examined brain-wide Neuropixels recordings and widefield calcium imaging collected by the International Brain Laboratory. Mice were trained to indicate the location of a visual grating stimulus, which appeared on the left or right with prior probability alternating between 0.2 and 0.8 in blocks of variable length. We found that mice estimate this prior probability and thereby improve their decision accuracy. Furthermore, we report that this subjective prior is encoded in at least 20% to 30% of brain regions which, remarkably, span all levels of processing, from early sensory areas (LGd, VISp) to motor regions (MOs, MOp, GRN) and high level cortical regions (ACCd, ORBvl). This widespread representation of the prior is consistent with a neural model of Bayesian inference involving loops between areas, as opposed to a model in which the prior is incorporated only in decision making areas. This study offers the first brain-wide perspective on prior encoding at cellular resolution, underscoring the importance of using large scale recordings on a single standardized task.
Everyday life features uncertain and ever-changing situations. In such environments, optimal adaptive behavior requires higher-order inferential capabilities to grasp the volatility of external contingencies. These capabilities however involve complex and rapidly intractable computations, so that we poorly understand how humans develop efficient adaptive behaviors in such environments. Here we demonstrate this counterintuitive result: simple, low-level inferential processes involving imprecise computations conforming to the psychophysical Weber Law actually lead to near-optimal adaptive behavior, regardless of the environment volatility. Using volatile experimental settings, we further show that such imprecise, low-level inferential processes accounted for observed human adaptive performances, unlike optimal adaptive models involving higher-order inferential capabilities, their biologically more plausible, algorithmic approximations and non-inferential adaptive models like reinforcement learning. Thus, minimal inferential capabilities may have evolved along with imprecise neural computations as contributing to near-optimal adaptive behavior in real-life environments, while leading humans to make suboptimal choices in canonical decision-making tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.