Summary Central to the organization of behavior is the ability to predict the values of outcomes to guide choices. The accuracy of such predictions is honed by a teaching signal that indicates how incorrect a prediction was (‘reward prediction error’, RPE). In several reinforcement learning contexts such as Pavlovian conditioning and decisions guided by reward history, this RPE signal is provided by midbrain dopamine neurons. In many situations, however, the stimuli predictive of outcomes are perceptually ambiguous. Perceptual uncertainty is known to influences choices, but it has been unclear whether or how dopamine neurons factor it into their teaching signal. To cope with uncertainty, we extended a reinforcement learning model with a belief state about the perceptually ambiguous stimulus; this model generates an estimate of the probability of choice correctness, termed decision confidence. We show that dopamine responses in monkeys performing a perceptually ambiguous decision task comply with the model’s predictions. Consequently, dopamine responses did not simply reflect a stimulus’ average expected reward value, but were predictive of the trial-to-trial fluctuations in perceptual accuracy. These confidence-dependent dopamine responses emerged prior to monkeys’ choice initiation raising the possibility that dopamine impacts impeding decisions, in addition to encoding a post-decision teaching signal. Finally, by manipulating reward size, we found that dopamine neurons reflect both the upcoming reward size and the confidence in achieving it. Together, our results show that dopamine responses convey teaching signals that are also appropriate for perceptual decisions.
Midbrain dopamine neurons respond to reward-predictive stimuli. In the natural environment reward-predictive stimuli are often perceptually complicated. Thus, to discriminate one stimulus from another, elaborate sensory processing is necessary. Given that previous studies have used simpler types of reward-predictive stimuli, it has yet to be clear whether and, if so, how dopamine neurons obtain reward information from perceptually complicated stimuli. To investigate this, we recorded the activities of monkey dopamine neurons while they were performing discrimination between two coherent motion directions in random-dot motion stimuli. These coherent directions were paired with different magnitudes of reward. We found that dopamine neurons showed reward-predictive responses to random-dot motion stimuli. Moreover, dopamine neurons showed temporally extended activity correlated with changes in reward prediction (i.e., reward prediction error) from coarse to fine scales between initial motion detection and subsequent motion discrimination phases. Noticeably, dopamine reward-predictive responses became differential in a later phase than previously reported. This response pattern was consistent with the time course of processing required for the estimation of expected reward value that parallels the motion direction discrimination processing. The results demonstrate that dopamine neurons are able to reflect the reward value of perceptually complicated stimuli, and suggest that dopamine neurons use the moment-to-moment reward prediction associated with environmental stimuli to compute a reward prediction error.
The lateral prefrontal cortex (LPFC) has been implicated in visuospatial processing, especially when it is required to hold spatial information during a delay period. It has also been reported that the LPFC receives information about expected reward outcome. However, the interaction between visuospatial processing and reward processing is still unclear because the two types of processing could not be dissociated in conventional delayed response tasks. To examine this, we used a memory-guided saccade task with an asymmetric reward schedule and recorded 228 LPFC neurons. The position of the target cue indicated the spatial location for the following saccade and the color of the target cue indicated the reward outcome for a correct saccade. Activity of LPFC was classified into three main types: S-type activity carried only spatial signals, R-type activity carried only reward signals, and SR-type activity carried both. Therefore only SR-type cells were potentially involved in both visuospatial processing and reward processing. SR-type activity was enhanced (SR+) or depressed (SR-) by the reward expectation. The spatial discriminability as expressed by the transmitted information was improved by reward expectation in SR+ type. In contrast, when reward information was coded by an increase of activity in the reward-absent condition (SR- type), it did not improve the spatial representation. This activity appeared to be involved in gaze fixation. These results extend previous findings suggesting that the LPFC exerts dual influences based on predicted reward outcome: improvement of memory-guided saccades (when reward is expected) and suppression of inappropriate behavior (when reward is not expected).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.