Value-free reinforcement learning: policy optimization as a minimal model of operant behavior

Bennett, Daniel; Niv, Yaron; Langdon, Angela J.

doi:10.1016/j.cobeha.2021.04.020

Cited by 29 publications

(30 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[ 25 ]) rather than via action value representations as we have modeled here. In policy learning, generalization between odor trial types would be limited as alternative actions are grouped together, separating forced-choice from free-choice trials through the presence of the unrewarded choice option in these trial-types [ 26 ].…”

Section: Discussionmentioning

confidence: 99%

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

et al. 2022

Self Cite

View full text Add to dashboard Cite

There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals’ task representations abide by the generative task structure that governs the experimental design.

show abstract

Section: Discussionmentioning

confidence: 99%

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although the actor-critic model is only one of a wider class of RL models, the findings of the present study and the framework of the analysis may be applicable to other models. For example, the results may have implications for algorithms that perform value estimations and policy updates in different systems, such as policy-gradient approaches, which have attracted attention (Mongillo et al 2014;Bennett et al 2021). When considering the online learning of continuous actions, such as in determining response vigor, a policygradient method based on the REINFORCE algorithm (Williams 1992) is often used (Niv 2007;Lindström et al 2021).…”

Section: Discussionmentioning

confidence: 99%

“…In such models, the action values are directly translated into weights for corresponding actions such that the higher the action value, the more likely the action is to be chosen. However, it has been noted that many psychological and neuroscientific findings are concisely explained by policy-based RL models in which the preference for each action is represented independently of the reward expectations (for a review, see Mongillo et al 2014;Bennett et al 2021).…”

Section: Introductionmentioning

confidence: 99%

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Katahira

Kimura

2022

Comput Brain Behav

View full text Add to dashboard Cite

Reinforcement learning models have been used in many studies in the fields of neuroscience and psychology to model choice behavior and underlying computational processes. Models based on action values, which represent the expected reward from actions (e.g., Q-learning model), have been commonly used for this purpose. Meanwhile, the actor-critic learning model, in which the policy update and evaluation of an expected reward for a given state are performed in separate systems (actor and critic, respectively), has attracted attention due to its ability to explain the characteristics of various behaviors of living systems. However, the statistical property of the model behavior (i.e., how the choice depends on past rewards and choices) remains elusive. In this study, we examine the history dependence of the actor-critic model based on theoretical considerations and numerical simulations while considering the similarities with and differences from Q-learning models. We show that in actor-critic learning, a specific interaction between past reward and choice, which differs from Q-learning, influences the current choice. We also show that actor-critic learning predicts qualitatively different behavior from Q-learning, as the higher the expectation is, the less likely the behavior will be chosen afterwards. This study provides useful information for inferring computational and psychological principles from behavior by clarifying how actor-critic learning manifests in choice behavior.

show abstract

“…In assessing the relative reliability of each dimension in generating the target response, it is useful to interpret model output, and thus error, in terms of response probabilities (Bennett et al, 2021). Recapitulating eqs 6 & 7, model output for each dimension D in isolation is passed through a softmax function…”

Section: Learned Attention For Cognitive Controlmentioning

confidence: 99%

Cognitive control strategies derive from dimension reliability

Alexander¹

2022

Preprint

View full text Add to dashboard Cite

To explain behavioral effects, models of cognitive control frequently rely on task information provided by the modeler. ‘Hard-wired’ information can include labeling task dimensions as being relevant or irrelevant, defining which task stimuli belong to which task dimensions, or proposing a specific strategy by which control is adjusted during task performance. Although models incorporating hard-wired information of this nature are frequently successful at accounting for observed behavior, their ability to do so often depends on tailoring this information to specific tasks, usually performed in a laboratory setting. Outside of the laboratory, individuals are not usually provided explicit information about how to behave; it thus remains an open question as to how individuals identify, update, and switch task strategies in the real world. Here, we present a new model of cognitive control, Learned Attention for Control (LAC), that not only captures a broad range of control effects, but does so using a minimal amount of modeler-supplied information. In a series of simulations, we demonstrate how the LAC model adopts distinct control strategies based on recent trial history, adapts to changing behavioral contexts, and learns to group related task stimuli under the same abstract dimension. The model’s ability to do so derives from an ongoing evaluation of how well task stimuli independently predict correct behavior, and the results of this evaluation are used to shift attention amongst information sources. These results suggest that the reliability of information can serve as a general principle for understanding cognitive control.

show abstract

Value-free reinforcement learning: policy optimization as a minimal model of operant behavior

Cited by 29 publications

References 63 publications

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

Minimal cross-trial generalization in learning the representation of an odor-guided choice task

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Cognitive control strategies derive from dimension reliability

Contact Info

Product

Resources

About