Do learning rates adapt to the distribution of rewards?

Gershman, Samuel J.

doi:10.3758/s13423-014-0790-3

Cited by 120 publications

(170 citation statements)

References 23 publications

Supporting

Mentioning

150

Contrasting

Order By: Relevance

“…A consistent feature in the reinforcement learning literature is that learning rates for negative prediction errors are higher than those for positive prediction errors, regardless of the distribution of rewards in the task (25)(26)(27). The results of our Standard conditions are consistent with this pattern: In both the gating and probability models, the learning rate parameter that is operative on miss trials (α miss , α prob ), where the prediction error is always negative, is markedly higher than the learning rates active solely on hit trials (α hit , α payoff ), where prediction errors are primarily positive (Fig.…”

Section: Discussionmentioning

confidence: 92%

Credit assignment in movement-dependent reinforcement learning

McDougle

Boggess

Crossley

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants' explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.decision-making | reinforcement learning | sensory prediction error | reward prediction error | cerebellum W hen a diner reaches across the table and knocks over her coffee, the absence of anticipated reward should be attributed to a failure of coordination rather than diminish her love of coffee. Although this attribution is intuitive, current models of decision-making lack a mechanistic explanation for this seemingly simple computation. We set out to ask if, and how, selection processes in decision-making incorporate information specific to action execution and thus solve the credit assignment problem that arises when an expected reward is not obtained because of a failure in motor execution.Humans are highly capable of tracking the value of stimuli, varying their behavior on the basis of reinforcement history (1, 2), and exhibiting sensitivity to intrinsic motor noise when reward outcomes depend on movement accuracy (3-5). In real-world behavior, the underlying cause of unrewarded events is often ambiguous: A lost point in tennis could occur because the player made a poor choice about where to hit the ball or failed to properly execute the stroke. However, in laboratory studies of reinforcement learning, the underlying cause of unrewarded events is typically unambiguous, either solely dependent on properties of the stimulus or on motor noise. Thus, it remains unclear how people assign credit to either extrins...

show abstract

Section: Discussionmentioning

confidence: 92%

Credit assignment in movement-dependent reinforcement learning

McDougle

Boggess

Crossley

et al. 2016

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…To investigate whether rewarding outcomes engage DA signaling depending on genotype, we used fMRI. Our prior finding from Fto-deficient mice (Hess et al, 2013) suggested that a lack of Fto specifically impairs D2/3R-mediated autoinhibition of dopaminergic midbrain neurons. Furthermore, ANKK1 genotype modulates midbrain response to rewards in humans (Felsted et al, 2010), and reward prediction errors (PEs) are encoded by phasic dopamine release from neurons in the ventral tegmental area/substantia nigra (VTA/SN) (Schultz et al, 1997;Montague et al, 2004).…”

Section: Introductionmentioning

confidence: 90%

“…Moreover, our recent analysis of Fto-deficient mice revealed that a lack of Fto specifically impairs dopamine receptor D2/3-mediated control of neuronal activation. Here Fto deficiency led to increased 6-methyl adenosine modification of specific mRNAs of critical components of D2/3R-signaling, including that of D3R and the GIRK2-channel, thus reducing their translation and affecting dopamine-dependent regulation of locomotor activity and reward sensitivity (Hess et al, 2013). Consistently, behavioral alterations associated with FTO variants in humans have also been linked to altered dopaminergic transmission (Kenny, 2011b).…”

Section: Introductionmentioning

confidence: 96%

“…Although variations in the fat mass and obesity-associated (FTO) gene are currently the strongest known genetic factor predisposing humans to nonmonogenic obesity 1 (Dina et al, 2007;Frayling et al, 2007), recent experiments have linked the same variants to a broad spectrum of altered behavioral responses (for review, see Hess and Brü ning, 2014), including food choice, attention deficiency, impulse control, and substance abuse (Sobczyk-Kopciol et al, 2011;Choudhry et al, 2013;Karra et al, 2013;Chuang et al, 2015); also, the A variant of rs9939609 has been recently associated with a lower risk of depression (Samaan et al, 2013). However, the underlying neurobiological mechanisms by which FTO or obesitypredisposing variants of the human FTO gene affect behavior remain elusive.…”

Section: Introductionmentioning

confidence: 99%

“…Healthy individuals who carry the A1 allele, compared with those who do not, show diminished striatal D2R density (Jönsson et al, 1999) and reduced glucose metabolism in dopaminoceptive regions involved in reward processing (Noble et al, 1997). This genetic trait has been shown to moderate (1) increased likelihood of obesity (Noble et al, 1994), (2) food reinforcement and intake, especially in obese individuals (Epstein et al, 2007), and (3) the association between neural responses and weight gain (Stice et al, 2008).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Obesity-Predisposing Variant of the FTO Gene Regulates D2R-Dependent Reward Learning

Sevgi

Rigoux

Kuhn

et al. 2015

Journal of Neuroscience

View full text Add to dashboard Cite

Variations in the fat mass and obesity-associated (FTO) gene are linked to obesity. However, the underlying neurobiological mechanisms by which these genetic variants influence obesity, behavior, and brain are unknown. Given that Fto regulates D2/3R signaling in mice, we tested in humans whether variants in FTO would interact with a variant in the ANKK1 gene, which alters D2R signaling and is also associated with obesity. In a behavioral and fMRI study, we demonstrate that gene variants of FTO affect dopamine (D2)-dependent midbrain brain responses to reward learning and behavioral responses associated with learning from negative outcome in humans. Furthermore, dynamic causal modeling confirmed that FTO variants modulate the connectivity in a basic reward circuit of meso-striatoprefrontal regions, suggesting a mechanism by which genetic predisposition alters reward processing not only in obesity, but also in other disorders with altered D2R-dependent impulse control, such as addiction.

show abstract

The neural network basis of altered decision‐making in patients with amyotrophic lateral sclerosis

Imai

Masuda

Watanabe

et al. 2020

Ann Clin Transl Neurol

View full text Add to dashboard Cite

Objective: Amyotrophic lateral sclerosis (ALS) is a multisystem disorder associated with motor impairment and behavioral/cognitive involvement. We examined decision-making features and changes in the neural hub network in patients with ALS using a probabilistic reversal learning task and resting-state network analysis, respectively. Methods: Ninety ALS patients and 127 cognitively normal participants performed this task. Data from 62 ALS patients and 63 control participants were fitted to a Q-learning model. Results: ALS patients had anomalous decision-making features with little shift in choice until they thought the value of the two alternatives had become equal. The quantified parameters (Pαβ) calculated by logistic regression analysis with learning rate and inverse temperature well represented the unique choice pattern of ALS patients. Resting-state network analysis demonstrated a strong correlation between Pαβ and decreased degree centrality in the anterior cingulate gyrus and frontal pole. Interpretation: Altered decision-making in ALS patients may be related to the decreased hub function of medial prefrontal areas.

show abstract

Do learning rates adapt to the distribution of rewards?

Cited by 120 publications

References 23 publications

Credit assignment in movement-dependent reinforcement learning

Credit assignment in movement-dependent reinforcement learning

An Obesity-Predisposing Variant of the FTO Gene Regulates D2R-Dependent Reward Learning

The neural network basis of altered decision‐making in patients with amyotrophic lateral sclerosis

Contact Info

Product

Resources

About