Dynamic Bayesian inference allows a system to infer the environmental state under conditions of limited sensory observation. Using a goal-reaching task, we found that posterior parietal cortex (PPC) and adjacent posteromedial cortex (PM) implemented the two fundamental features of dynamic Bayesian inference: prediction of hidden states using an internal state transition model and updating the prediction with new sensory evidence. We optically imaged the activity of neurons in mouse PPC and PM layers 2, 3 and 5 in an acoustic virtual-reality system. As mice approached a reward site, anticipatory licking increased even when sound cues were intermittently presented; this was disturbed by PPC silencing. Probabilistic population decoding revealed that neurons in PPC and PM represented goal distances during sound omission (prediction), particularly in PPC layers 3 and 5, and prediction improved with the observation of cue sounds (updating). Our results illustrate how cerebral cortex realizes mental simulation using an action-dependent dynamic model.
Summary
In perceptual decision-making, prior knowledge of action outcomes is essential, especially when sensory inputs are insufficient for proper choices. Signal detection theory (SDT) shows that optimal choice bias depends not only on the prior but also the sensory uncertainty; however, it is unclear how animals integrate sensory inputs with various uncertainties and reward expectations to optimize choices. We developed a tone-frequency discrimination task for head-fixed mice in which we randomly presented either a long or short sound stimulus and biased the choice outcomes. The choice was less accurate and more biased toward the large-reward side in short- than in long-stimulus trials. Analysis with SDT found that mice did not use a separate, optimal choice threshold in different sound durations. Instead, mice updated one threshold for short and long stimuli with a simple reinforcement-learning rule. Our task in head-fixed mice helps understanding how the brain integrates sensory inputs and prior.
The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20–549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.