2020
DOI: 10.3389/fnbeh.2020.00141
|View full text |Cite
|
Sign up to set email alerts
|

Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward

Abstract: “To do or not to do” is a fundamental decision that has to be made in daily life. Behaviors related to multiple “to do” choice tasks have long been explained by reinforcement learning, and “to do or not to do” tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(30 citation statements)
references
References 52 publications
(83 reference statements)
2
28
0
Order By: Relevance
“…We modified a two-tone lever-pull task for head-fixed mice that we previously reported 5 into a two-tone lever-pull task with the risk of two types of punishment: positive punishment (i.e., exposure to an air-puff) and negative punishment (i.e., omission of a reward) (Figures 1A and 1B). Two pure tones (6 and 10 kHz pure tone for 0.8–1.2 s), the reward-seeking action (pulling the lever after the go sound cue [pink noise] that followed the tone presentation), and the reward (a water drop delivered from a spout near the mouth) were common to both tasks.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…We modified a two-tone lever-pull task for head-fixed mice that we previously reported 5 into a two-tone lever-pull task with the risk of two types of punishment: positive punishment (i.e., exposure to an air-puff) and negative punishment (i.e., omission of a reward) (Figures 1A and 1B). Two pure tones (6 and 10 kHz pure tone for 0.8–1.2 s), the reward-seeking action (pulling the lever after the go sound cue [pink noise] that followed the tone presentation), and the reward (a water drop delivered from a spout near the mouth) were common to both tasks.…”
Section: Resultsmentioning
confidence: 99%
“…To better understand how positive punishment (air-puff) and negative punishment (reward omission) affected the choice behavior (pull or non-pull) of the mouse, we constructed Q -learning models with a maximum of five parameters 5,24 to predict the choice behavior during the training sessions in the air-puff and omission tasks (Table S1; see STAR Methods for details). On the basis of our previous study 5 , we assumed that these tasks included two choices, pull and non-pull, and there were values of pulling the lever ( Q pull ) and non-pulling of the lever ( Q non-pull ) for both tone A and B trials in each task. The pull-choice probability in each trial was determined from the sigmoidal function of the difference between Q pull and Q non-pull in that trial.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…This dmFrC activity might be maintained by cortico-basal gangliathalamocortical pathways (Haber, 2014;Sesack and Grace, 2010). Recently, we found that, in head-fixed mice trained to pull a lever to obtain a reward in response to different tone cues assigned to different reward probabilities, the mice pulled the lever more frequently in response to the higher-reward-predicting cue than in response to the lower-reward-predicting cue (Tanimoto et al, 2020). If the neuronal activity in the relevant areas can be recorded and manipulated during this operant task, it should clarify how the dmFrC neurons integrate the information related to the reward-predicting cue from other brain areas and use it to select the appropriate action.…”
Section: Discussionmentioning
confidence: 95%