Reward-related cues are an important part of our daily life as they often influence and guide our actions. This paper reviews one of the experimental paradigms used to study the effects of cues, the Pavlovian to Instrumental Transfer paradigm. In this paradigm, cues associated with rewards through Pavlovian conditioning alter motivation and choice of instrumental actions. The first transfer experiments date back to the 1940s, but only in the last decade has it been fully recognised that there are two types of transfer, specific and general. This paper presents a systematic review of both the neural substrates and the behavioral factors affecting both types of transfer. It also examines the recent application of the paradigm to study the effect of cues on human participants, both in normal and pathological conditions, and the interactions of transfer with drugs of abuse. Finally, the paper analyses the theoretical aspects of transfer to build an overall picture of the phenomenon, from early theories to recent hierarchical accounts.
Balancing habitual and deliberate forms of choice entails a comparison of their respective merits-the former being faster but inflexible, and the latter slower but more versatile. Here, we show that arbitration between these two forms of control can be derived from first principles within an Active Inference scheme. We illustrate our arguments with simulations that reproduce rodent spatial decisions in T-mazes. In this context, deliberation has been associated with vicarious trial and error (VTE) behavior (i.e., the fact that rodents sometimes stop at decision points as if deliberating between choice alternatives), whose neurophysiological correlates are "forward sweeps" of hippocampal place cells in the arms of the maze under consideration. Crucially, forward sweeps arise early in learning and disappear shortly after, marking a transition from deliberative to habitual choice. Our simulations show that this transition emerges as the optimal solution to the trade-off between policies that maximize reward or extrinsic value (habitual policies) and those that also consider the epistemic value of exploratory behavior (deliberative or epistemic policies)-the latter requiring VTE and the retrieval of episodic information via forward sweeps. We thus offer a novel perspective on the optimality principles that engender forward sweeps and VTE, and on their role on deliberate choice.Substantial evidence indicates that animal behavior is determined both by deliberative processes (i.e., based on predictions of future outcomes and rewards) and by habitual reflexes (i.e., based on stimulus-response associations; Balleine and Dickinson 1998). The former are more resource intensive and sensitive to changes in task contingencies, while the latter are cheaper but inflexible; hence whether it is optimal to call on deliberative or habitual choice depends on the trade-off between the advantage of flexibility and computational costs (Balleine and Dickinson 1998;Dolan and Dayan 2013;Lee et al. 2014). In this paper, we try to understand the contextualization of behavior and the trade-off between deliberative and habitual choice from first principles, using Active Inference and Markov decision process models of exploitation and exploration (Friston et al. 2013(Friston et al. , 2014Pezzulo et al. 2015).We focus specifically on vicarious trial and error (VTE) behavior, which is considered a hallmark of deliberation (Muenzinger 1938;Tolman 1938Tolman , 1939. This is based on the observation that, when rodents have to remember or search the correct route to a reward in a maze (e.g., a T-maze), they sometimes stop at choice points, to look left and right before choosing which direction to go. This has been interpreted as a signature of cognitive search and deliberation between the two choices (i.e., going right or left). In keeping with a role of VTE behavior for deliberation, it occurs early in learning and decreases or disappears after significant experience (Tolman 1939;van der Meer and Redish 2010;van der Meer et al. 2012) but it can incr...
Pavlovian conditioned stimuli can influence instrumental responding, an effect called Pavlovian-instrumental transfer (PIT). During the last decade, PIT has been subdivided into two types: specific PIT and general PIT, each having its own neural substrates. Specific PIT happens when a conditioned stimulus (CS) associated with a reward enhances an instrumental response directed to the same reward. Under general PIT, instead, the CS enhances a response directed to a different reward. While important progress has been made into identifying the neural substrates, the function of specific and general PIT and how they interact with instrumental responses are still not clear. In the experimental paradigm that distinguishes specific and general PIT an effect of PIT inhibition has also been observed and is waiting for an explanation. Here we propose an hypothesis that links these three PIT effects (specific PIT, general PIT and PIT inhibition) to three aspects of action evaluation. These three aspects, which we call “principles of action”, are: context, efficacy, and utility. In goal-directed behavior, an agent has to evaluate if the context is suitable to accomplish the goal, the efficacy of his action in getting the goal, and the utility of the goal itself: we suggest that each of the three PIT effects is related to one of these aspects of action evaluation. In particular, we link specific PIT with the estimation of efficacy, general PIT with the evaluation of utility, and PIT inhibition with the adequacy of context. We also provide a latent cause Bayesian computational model that exemplifies this hypothesis. This hypothesis and the model provide a new framework and new predictions to advance knowledge about PIT functioning and its role in animal adaptation.
Autonomous multiple tasks learning is a fundamental capability to develop versatile artificial agents that can act in complex environments. In real-world scenarios, tasks may be interrelated (or "hierarchical") so that a robot has to first learn to achieve some of them to set the preconditions for learning other ones. Even though different strategies have been used in robotics to tackle the acquisition of interrelated tasks, in particular within the developmental robotics framework, autonomous learning in this kind of scenarios is still an open question. Building on previous research in the framework of intrinsically motivated open-ended learning, in this work we describe how this question can be addressed working on the level of task selection, in particular considering the multiple interrelated tasks scenario as an MDP where the system is trying to maximise its competence over all the tasks.
Goal-directed behavior is influenced by environmental cues: in particular, cues associated with a reward can bias action choice toward actions directed to that same reward. This effect is studied experimentally as specific Pavlovian-instrumental transfer (specific PIT). We have investigated the hypothesis that cues associated to an outcome elicit specific PIT by rising the estimates of reward probability of actions associated to that same outcome. In other words, cues reduce the uncertainty on the efficacy of instrumental actions. We used a human PIT experimental paradigm to test the effects of two different instrumental contingencies: one group of participants had a 33% chance of being rewarded for each button press, while another had a 100% chance. The group trained with 33% reward probability showed a stronger PIT effect than the 100% group, in line with the hypothesis that Pavlovian cues linked to an outcome work by reducing the uncertainty of receiving it. The 100% group also showed a significant specific PIT effect, highlighting additional factors that could contribute to specific PIT beyond the instrumental training contingency. We hypothesize that the uncertainty about reward delivery due to testing in extinction might be one of these factors. These results add knowledge on how goal-directed behavior is influenced by the presence of environmental cues associated with a reward: such influence depends on the probability that we have to reach a reward, namely when there is less chance of getting a reward we are more influenced by cues associated with it, and vice versa.
Extinction of Pavlovian conditioning is a complex process that involves brain regions such as the medial prefrontal cortex (mPFC), the amygdala and the locus coeruleus. In particular, noradrenaline (NA) coming from the locus coeruleus has been recently shown to play a different role in two subregions of the mPFC, the prelimbic (PL) and the infralimbic (IL) regions. How these regions interact in conditioning and subsequent extinction is an open issue. We studied these processes using two approaches: computational modelling and NA manipulation in a conditioned place preference paradigm (CPP) in mice. In the computational model, NA in PL and IL causes inputs arriving to these regions to be amplified, thus allowing them to modulate learning processes in amygdala. The model reproduces results from studies involving depletion of NA from PL, IL, or both in CPP. In addition, we simulated new experiments of NA manipulations in mPFC, making predictions on the possible results. We searched the parameters of the model and tested the robustness of the predictions by performing a sensitivity analysis. We also present an empirical experiment where, in accord with the model, a double depletion of NA from both PL and IL in CPP with amphetamine impairs extinction. Overall the proposed model, supported by anatomical, physiological, and behavioural data, explains the differential role of NA in PL and IL and opens up the possibility to understand extinction mechanisms more in depth and hence to aid the development of treatments for disorders such as addiction.
Categorical perception identifies a tuning of human perceptual systems that can occur during the execution of a categorisation task. Despite the fact that experimental studies and computational models suggest that this tuning is influenced by task-independent effects (e.g., based on Hebbian and unsupervised learning, UL) and task-dependent effects (e.g., based on reward signals and reinforcement learning, RL), no model studies the UL/RL interaction during the emergence of categorical perception. Here we have investigated the effects of this interaction, proposing a system-level neuro-inspired computational architecture in which a perceptual component integrates UL and RL processes. The model has been tested with a categorisation task and the results show that a balanced mix of unsupervised and reinforcement learning leads to the emergence of a suitable categorical perception and the best performance in the task. Indeed, an excessive unsupervised learning contribution tends to not identify task-relevant features while an excessive reinforcement learning contribution tends to initially learn slowly and then to reach sub-optimal performance. These results are consistent with the experimental evidence regarding categorical activations of extrastriate cortices in healthy conditions. Finally, the results produced by the two extreme cases of our model can explain the existence of several factors that may lead to sensory alterations in autistic people.
In mammals, goal-directed and planning processes support flexible behaviour used to face new situations that cannot be tackled through more efficient but rigid habitual behaviours. Within the Bayesian modelling approach of brain and behaviour, models have been proposed to perform planning as probabilistic inference but this approach encounters a crucial problem: explaining how such inference might be implemented in brain spiking networks. Recently, the literature has proposed some models that face this problem through recurrent spiking neural networks able to internally simulate state trajectories, the core function at the basis of planning. However, the proposed models have relevant limitations that make them biologically implausible, namely their world model is trained ‘off-line’ before solving the target tasks, and they are trained with supervised learning procedures that are biologically and ecologically not plausible. Here we propose two novel hypotheses on how brain might overcome these problems, and operationalise them in a novel architecture pivoting on a spiking recurrent neural network. The first hypothesis allows the architecture to learn the world model in parallel with its use for planning: to this purpose, a new arbitration mechanism decides when to explore, for learning the world model, or when to exploit it, for planning, based on the entropy of the world model itself. The second hypothesis allows the architecture to use an unsupervised learning process to learn the world model by observing the effects of actions. The architecture is validated by reproducing and accounting for the learning profiles and reaction times of human participants learning to solve a visuomotor learning task that is new for them. Overall, the architecture represents the first instance of a model bridging probabilistic planning and spiking-processes that has a degree of autonomy analogous to the one of real organisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.