Encoding of Action History in the Rat Ventral Striatum

Kim, Yun Bok; Huh, Namjung; Lee, Hyunjung; Baeg, Eun Ha; Lee, Dongsoo; Jung, Min Whan

doi:10.1152/jn.00310.2007

Cited by 37 publications

(36 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The activity of NAc neurons modulated by different actions during the execution of the poking (action-coding neurons) might code either differences in the physical movements or differences in the spatial position of rats. This notion is consistent with previous reports showing that the responses of a subset of NAc neurons changed with different choices of actions in discrimination tasks and a spatial-delayed matching-to-sample task (Chang et al, 2002;Kim et al, 2007;Taha et al, 2007). In the current study, we found evidence that information related to action lasted beyond the timing of reward delivery after the choice.…”

Section: Modeling Rats' Choice Behaviorsupporting

confidence: 93%

Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia

Ito¹,

Doya²

2009

J. Neurosci.

253

352

View full text Add to dashboard Cite

Reinforcement learning theory plays a key role in understanding the behavioral and neural mechanisms of choice behavior in animals and humans. Especially, intermediate variables of learning models estimated from behavioral data, such as the expectation of reward for each candidate choice (action value), have been used in searches for the neural correlates of computational elements in learning and decision making. The aims of the present study are as follows: (1) to test which computational model best captures the choice learning process in animals and (2) to elucidate how action values are represented in different parts of the corticobasal ganglia circuit. We compared different behavioral learning algorithms to predict the choice sequences generated by rats during a free-choice task and analyzed associated neural activity in the nucleus accumbens (NAc) and ventral pallidum (VP). The major findings of this study were as follows: (1) modified versions of an action-value learning model captured a variety of choice strategies of rats, including win-stay-loseswitch and persevering behavior, and predicted rats' choice sequences better than the best multistep Markov model; and (2) information about action values and future actions was coded in both the NAc and VP, but was less dominant than information about trial types, selected actions, and reward outcome. The results of our model-based analysis suggest that the primary role of the NAc and VP is to monitor information important for updating choice behaviors. Information represented in the NAc and VP might contribute to a choice mechanism that is situated elsewhere.

show abstract

Section: Modeling Rats' Choice Behaviorsupporting

confidence: 93%

Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia

Ito¹,

Doya²

2009

J. Neurosci.

253

352

View full text Add to dashboard Cite

show abstract

“…This idea is also consistent with "actor-critic" models of ventral striatal areas during reinforcement learning, in which the striatum contributes to the estimation of internal state values that are compared against incoming sensory information on future trials in order to generate appropriate error signals (O'Doherty et al 2004). Consistent with this model, ventral striatal neurons thought to be connected to medial prefrontal and orbitofrontal areas have been shown to modulate their firing patterns based on choices made in the previous trial (Kim et al 2007), suggesting that these neurons retain a memory trace of previous decision outcomes in order to update expected outcomes on the current trial. It remains unclear whether this form of temporal continuity in firing rates is also present in dorsal striatal neurons near the region that was active in the present study.…”

Section: Discussionsupporting

confidence: 62%

The organization and dynamics of corticostriatal pathways link the medial orbitofrontal cortex to future behavioral responses

Verstynen

2014

Journal of Neurophysiology

View full text Add to dashboard Cite

Verstynen TD. The organization and dynamics of corticostriatal pathways link the medial orbitofrontal cortex to future behavioral responses. J Neurophysiol 112: 2457-2469. First published August 20, 2014 doi:10.1152/jn.00221.2014.-Accurately making a decision in the face of incongruent options increases the efficiency of making similar congruency decisions in the future. Contextual factors like reward can modulate this adaptive process, suggesting that networks associated with monitoring previous success and failure outcomes might contribute to this form of behavioral updating. To evaluate this possibility, a group of healthy adults (n ϭ 30) were tested with functional MRI (fMRI) while they performed a color-word Stroop task. In a conflict-related region of the medial orbitofrontal cortex (mOFC), stronger BOLD responses predicted faster response times (RTs) on the next trial. More importantly, the degree of behavioral adaptation of RTs was correlated with the magnitude of mOFC-RT associations on the previous trial, but only after accounting for network-level interactions with prefrontal and striatal regions. This suggests that congruency sequencing effects may rely on interactions between distributed corticostriatal circuits. This possibility was evaluated by measuring the convergence of white matter projections from frontal areas into the striatum with diffusion-weighted imaging. In these pathways, greater convergence of corticostriatal projections correlated with stronger functional mOFC-RT associations that, in turn, provided an indirect pathway linking anatomical structure to behavior. Thus distributed corticostriatal processing may mediate the orbitofrontal cortex's influence on behavioral updating, even in the absence of explicit rewards.

show abstract

“…As another possibility, which is not mutually exclusive to the scenario above, the estimate of reward-based arming probability (i.e., action value function estimated according to a simple RL algorithm) and the latest run length might be separately computed before being combined to estimate the final stacked arming probability. Physiological studies have found neural signals that are related to action value functions that were computed based on a simple RL algorithm Samejima et al 2005;Seo and Lee 2007) and neural signals that are related to animal's previous choice Kim et al 2007;Seo and Lee 2008) or the number of selfexecuted actions (Sawamura et al 2002) in cortical and subcortical brain structures. The latter may represent the latest run length in the DAWH task.…”

Section: Discussionmentioning

confidence: 99%

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents

Huh

Jo²,

Kim³

et al. 2009

Learn. Mem.

Self Cite

View full text Add to dashboard Cite

Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.Animals must continually update their behavioral strategies according to changes in an environment in order to optimize their choices. Reinforcement learning (RL) models (Sutton and Barto 1998) provide a powerful theoretical framework for understanding choice behavior in humans and animals in a dynamic environment. In theories of RL, future actions are chosen so as to maximize a long-term sum of positive outcomes, and this can be accomplished by a set of value functions that represent the amount of expected reward that is associated with particular states or actions. The value functions are continually updated based on the reward prediction error, which is the difference between the expected and actual rewards. This way, even without prior knowledge about an uncertain and dynamically changing environment, an animal can discover the structure of the environment that can be exploited for optimal choice by trial-and-error. Not surprisingly, human and monkey choice behaviors in various tasks are well described by reinforcement learning algorithms (e.g., O'Doherty et al. 2003;Barraclough et al. 2004;Lee et al. 2004;Samejima et al. 2005;Daw et al. 2006;Pessiglione et al. 2006).The updating of value functions can be achieved in two fundamentally different ways. In simple or direct RL algorithms, value functions are updated only by trial-and-error. In other words, only the value function that is associated with the chosen action is updated, and those that are associated with uncommitted actions remain unchanged. On the other hand, in indirect or model-based RL algorithms, the value functions might also change according to the decis...

show abstract

Encoding of Action History in the Rat Ventral Striatum

Cited by 37 publications

References 46 publications

Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia

Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia

The organization and dynamics of corticostriatal pathways link the medial orbitofrontal cortex to future behavioral responses

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents

Contact Info

Product

Resources

About