2007
DOI: 10.1037/0033-295x.114.3.784
|View full text |Cite|
|
Sign up to set email alerts
|

Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling.

Abstract: Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine carries a reward prediction error signal; these models predict reward by driving that reward error to zero. The author… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

13
459
1
1

Year Published

2007
2007
2018
2018

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 320 publications
(485 citation statements)
references
References 219 publications
13
459
1
1
Order By: Relevance
“…This account suggests that animals are sensitive to the differences between the fearful or acquisition state and the extinction state (Capaldi 1966;Redish et al 2007). In the context of these experiments, when extinction closely follows acquisition or retrieval (which strongly engages the original fearful CS -US memory), the subjects have trouble distinguishing between whether the nonreinforced context/CS exposure during extinction still predicts the original fear contingency.…”
Section: Discussionmentioning
confidence: 99%
“…This account suggests that animals are sensitive to the differences between the fearful or acquisition state and the extinction state (Capaldi 1966;Redish et al 2007). In the context of these experiments, when extinction closely follows acquisition or retrieval (which strongly engages the original fearful CS -US memory), the subjects have trouble distinguishing between whether the nonreinforced context/CS exposure during extinction still predicts the original fear contingency.…”
Section: Discussionmentioning
confidence: 99%
“…If one notices that a causal relation behaves consistently for a period of time and later starts behaving in a different way, one might infer that some unobserved factor changed (e.g., Buchanan & Sobel, 2011;Gershman, Blei, & Niv, 2010;Redish, Jensen, Johnson, & Kurth-Nelson, 2007;Rottman & Ahn, 2011). Believing that something about the world has changed but that the new state of the world is relatively stable is one rational reason for recency effects; experiences farther away in time are less informative of the current functioning of the world.…”
Section: Inferring Time Periods or Unobserved Factorsmentioning
confidence: 99%
“…A central characteristic of such models is that they select actions based purely on a scalar value associated with taking the action in the current situation. This means that such models cannot accommodate more flexible responses such as latent-learning, devaluation, extinction, or reversal [15,52].In contrast to the involvement of dorsolateral striatum in outcome-independent control, recent evidence indicates that dorsomedial striatum is involved in flexible goal-directed actions, including the map-based components of navigation tasks [53,54] and the learning and performance of goal-directed actions of instrumental conditioning tasks [55-57].As reviewed above, the expression of flexible goal-directed behavior requires at least two processes: access to the knowledge that a given action leads to a particular outcome, and an evaluation of the outcome that takes the organism's current needs into account. These processes and their striatal underpinnings have been dissociated in instrumental conditioning experiments.…”
mentioning
confidence: 99%