Computational and learning theory models propose that behavioral control reflects value that is both cached (computed and stored during previous experience) and inferred (estimated on-the-fly based on knowledge of the causal structure of the environment). The latter is thought to depend on the orbitofrontal cortex. Yet, some accounts propose that the orbitofrontal cortex contributes to behavior by signaling “economic” value, regardless of the associative basis of the information. We found that the orbitofrontal cortex is critical for both value-based behavior and learning when value must be inferred but not when a cached value is sufficient. The orbitofrontal cortex is thus fundamental for accessing model-based representations of the environment to compute value rather than for signaling value per se.
In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value-versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.
The orbitofrontal cortex (OFC) has long been implicated in associative learning. Early work by Mishkin and Rolls showed that the OFC was critical for rapid changes in learned behavior, a role that was reflected in the encoding of associative information by orbitofrontal neurons. Over the years, new data—particularly neurophysiological data—have increasingly emphasized the OFC in signaling actual value. These signals have been reported to vary according to internal preferences and judgments and to even be completely independent of the sensory qualities of predictive cues, the actual rewards, and the responses required to obtain them. At the same time, increasingly sophisticated behavioral studies have shown that the OFC is often unnecessary for simple value-based behavior and instead seems critical when information about specific outcomes must be used to guide behavior and learning. Here, we review these data and suggest a theory that potentially reconciles these two ideas, value versus specific outcomes, and bodies of work on the OFC.
The best way to respond flexibly to changes in the environment is to anticipate them. Such anticipation often benefits us if we can infer that a change has occurred, before we have actually experienced the effects of that change. Here we test for neural correlates of this process by recording single-unit activity in the orbitofrontal cortex in rats performing a choice task in which the available rewards changed across blocks of trials. Consistent with the proposal that orbitofrontal cortex signals inferred information, firing changes at the start of each new block as if predicting the not-yet-experienced reward. This change occurs whether the new reward is different in number of drops, requiring signaling of a new value, or in flavor, requiring signaling of a new sensory feature. These results show that orbitofrontal neurons provide a behaviorally relevant signal that reflects inferences about both value-relevant and value-neutral information about impending outcomes.
Patients with damage to the orbitofrontal cortex (OFC) display various impairments in cognitive and affective function, including a reduced ability to use information about the consequences of their actions to guide their behavior. In this study, rats with neurotoxic lesions of the OFC failed to use specific expectancies about outcomes to guide their learning of an instrumental discrimination task. In contrast, lesioned rats were unimpaired in a measure of learned motivational function, the potentiation of feeding under conditions of food satiation, by a conditioned stimulus that had been paired with food while the rats were food deprived. Notably, performance of both of these tasks has been shown to depend on the function of the basolateral amygdala (BLA), a region that is richly interconnected with the OFC. Thus, the present results are consistent with the view that the acquisition and use of specific outcome expectancies to guide behavior critically involve a neural system that includes the BLA and the OFC, but they indicate that certain motivational properties acquired by cues on the basis of appetitive learning involve BLA circuitry apart from the OFC.
The orbitofrontal cortex (OFC) has been described as signaling outcome expectancies or value. Evidence for the latter comes from the studies showing that neural signals in the OFC correlate with value across features. Yet features can co-vary with value, and individual units may participate in multiple ensembles coding different features. Here we used unblocking to test whether OFC neurons would respond to a predictive cue signaling a ‘valueless’ change in outcome flavor. Neurons were recorded as the rats learned about cues that signaled either an increase in reward number or a valueless change in flavor. We found that OFC neurons acquired responses to both predictive cues. This activity exceeded that exhibited to a ‘blocked’ cue and was correlated with activity to the actual outcome. These results show that OFC neurons fire to cues with no value independent of what can be inferred through features of the predicted outcome.DOI: http://dx.doi.org/10.7554/eLife.02653.001
Learning is proposed to occur when there is a discrepancy between reward prediction and reward receipt. At least two separate systems are thought to exist: one in which predictions are proposed to be based on model-free or cached values; and another in which predictions are model-based. A basic neural circuit for model-free reinforcement learning has already been described. In the model-free circuit the ventral striatum (VS) is thought to supply a common-currency reward prediction to midbrain dopamine neurons that compute prediction errors and drive learning. In a model-based system, predictions can include more information about an expected reward, such as its sensory attributes or current, unique value. This detailed prediction allows for both behavioral flexibility and learning driven by changes in sensory features of rewards alone. Recent evidence from animal learning and human imaging suggests that, in addition to model-free information, the VS also signals model-based information. Further, there is evidence that the orbitofrontal cortex (OFC) signals model-based information. Here we review these data and suggest that the OFC provides model-based information to this traditional model-free circuitry and offer possibilities as to how this interaction might occur.
When exposed to pairings of a visual stimulus with food delivery, rats normally acquire both conditioned orienting responses directed toward the visual stimulus and conditioned food-related responses. Consistent with the results of previous lesion studies, reversible inactivation of amygdala central nucleus function before each conditioning session prevented the acquisition of conditioned orienting responses, whereas food-related behaviors were acquired normally. By contrast, neither inactivation nor neurotoxic lesions of central nucleus affected the expression of previously acquired conditioned orienting responses. Thus, the central nucleus is apparently not critical to the maintenance of information required for conditioned orienting, but instead is necessary for memory storage elsewhere. Specialized roles for components of a circuit for conditioned orienting, which includes the central nucleus, the substantia nigra, and dorsolateral striatum, are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.