SummaryThere are often sudden changes in the state of environment. For a decision maker, accurate prediction and detection of change points are crucial for optimizing performance. Still unclear, however, is whether rodents are simply reactive to reinforcements, or if they can be proactive to estimate future change points during value-based decision making. In this study, we characterize head-fixed mice performing a two-armed bandit task with probabilistic reward reversals. Choice behavior deviates from classic reinforcement learning, but instead suggests a strategy involving belief updating, consistent with the anticipation of change points to exploit the task structure. Excitotoxic lesion and optogenetic inactivation implicate the anterior cingulate and premotor regions of medial frontal cortex. Specifically, over-estimation of hazard rate arises from imbalance across frontal hemispheres during the time window before the choice is made. Collectively, the results demonstrate that mice can capitalize on their knowledge of task regularities, and this estimation of future changes in the environment may be a main computational function of the rodent dorsal medial frontal cortex.
In a competitive game involving an animal and an opponent, the outcome is contingent on the choices of both players. To succeed, the animal must continually adapt to competitive pressure, or else risk being exploited and lose out on rewards. In this study, we demonstrate that head-fixed mice can be trained to play the iterative competitive game ‘matching pennies’ against a virtual computer opponent. We find that the animals’ performance is well described by a hybrid computational model that includes Q-learning and choice kernels. Comparing between matching pennies and a non-competitive two-armed bandit task, we show that the tasks encourage animals to operate at different regimes of reinforcement learning. To understand the involvement of neuromodulatory mechanisms, we measure fluctuations in pupil size and use multiple linear regression to relate the trial-by-trial transient pupil responses to decision-related variables. The analysis reveals that pupil responses are modulated by observable variables, including choice and outcome, as well as latent variables for value updating, but not action selection. Collectively, these results establish a paradigm for studying competitive decision-making in head-fixed mice and provide insights into the role of arousal-linked neuromodulation in the decision process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.