Humans use directed and random exploration to solve the explore–exploit dilemma.

Wilson, Robert; Geana, Andra; White, J.M.; Ludvig, Elliot Andrew; Cohen, Jonathan D.

doi:10.1037/a0038199

Cited by 496 publications

(896 citation statements)

References 30 publications

(52 reference statements)

Supporting

Mentioning

771

Contrasting

Order By: Relevance

“…Our results establish a causal role for the rFPC in regulating both exploration and exploitation, and they underscore that this region is critical for participants to look beyond the current benefits at hand to search for potentially greater rewards (Wilson et al, 2014). Together, findings from the tests of our three hypotheses support that the activation observed in FPC when participants switch to exploratory choices (e.g., Daw et al, 2006;Boorman et al, 2009) indeed relates to behavioral control in those situations.…”

Section: Discussionsupporting

confidence: 65%

“…Second, we investigated the more novel hypothesis that tDCS-mediated increases or decreases in exploration are related to higher or lower sensitivity to previous unexpected outcomes in payoff magnitudes (i.e., prediction errors), respectively. This hypothesis was motivated by proposals that the rFPC is involved in integrating memories of recent events to guide behavior (Tsujimoto et al, 2011). Our results were consistent with all three hypotheses: Anodal and cathodal rFPC-targeted tDCS indeed caused increased and decreased exploration, respectively.…”

Section: Introductionsupporting

confidence: 80%

“…Because the FPC is thought to integrate memory of recent experiences to directly inform choice (e.g., Tsujimoto et al, 2011), we hypothesized that anodal rFPC stimulation will increase participants' likelihood to explore either following negative prediction errors (i.e., unexpectedly low payoffs) for exploitative choices in the recent past and/or after positive prediction errors (i.e., unexpectedly high payoffs) for recent exploratory choices. By contrast, cathodal stimulation will result in reduced sensitivity to these prediction errors (for further information motivating this hypothesis, see tDCS-induced exploration relates to increased sensitivity to negative prediction errors).…”

Section: H1: the Right Fpc Is Causally Involved In Biasing Choices Tomentioning

confidence: 99%

See 2 more Smart Citations

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration–Exploitation Trade-Offs

et al. 2015

View full text Add to dashboard Cite

Optimal behavior requires striking a balance between exploiting tried-and-true options or exploring new possibilities. Neuroimaging studies have identified different brain regions in humans where neural activity is correlated with exploratory or exploitative behavior, but it is unclear whether this activity directly implements these choices or simply reflects a byproduct of the behavior. Moreover, it remains unknown whether arbitrating between exploration and exploitation can be influenced with exogenous methods, such as brain stimulation. In our study, we addressed these questions by selectively upregulating and downregulating neuronal excitability with anodal or cathodal transcranial direct current stimulation over right frontopolar cortex during a reward-learning task. This caused participants to make slower, more exploratory or faster, more exploitative decisions, respectively. Bayesian computational modeling revealed that stimulation affected how much participants took both expected and obtained rewards into account when choosing to exploit or explore: Cathodal stimulation resulted in an increased focus on the option expected to yield the highest payout, whereas anodal stimulation led to choices that were less influenced by anticipated payoff magnitudes and were more driven by recent negative reward prediction errors. These findings suggest that exploration is triggered by a neural mechanism that is sensitive to prior less-than-expected choice outcomes and thus pushes people to seek out alternative courses of action. Together, our findings establish a parsimonious neurobiological mechanism that causes exploration and exploitation, and they provide new insights into the choice features used by this mechanism to direct decision-making.

show abstract

Section: Discussionsupporting

confidence: 65%

Section: Introductionsupporting

confidence: 80%

Section: H1: the Right Fpc Is Causally Involved In Biasing Choices Tomentioning

confidence: 99%

See 1 more Smart Citation

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration–Exploitation Trade-Offs

et al. 2015

View full text Add to dashboard Cite

show abstract

“…The prior mean is close to the generative mean of 50 used in the actual experiment, and the decision parameters are comparable to those found in our previous work (Wilson et al, 2014). The learning rate parameters, a 1 and a ¥ , were not included in our previous models and are worth discussing in more detail.…”

Section: Model Fitting Resultssupporting

confidence: 60%

A causal role for right frontopolar cortex in directed, but not random, exploration

Zajkowski¹,

Kossut²,

Wilson³

2016

Preprint

Self Cite

View full text Add to dashboard Cite

The explore-exploit dilemma occurs anytime we must choose between exploring unknown options for information and exploiting known resources for reward. Previous work suggests that people use two different strategies to solve the explore-exploit dilemma: directed exploration, driven by information seeking, and random exploration, driven by decision noise. Here, we show that these two strategies rely on different neural systems. Using transcranial magnetic stimulation to inhibit the right frontopolar cortex, we were able to selectively inhibit directed exploration while leaving random exploration intact. This suggests a causal role for right frontopolar cortex in directed, but not random, exploration and that directed and random exploration rely on (at least partially) dissociable neural systems.

show abstract

“…This set includes tasks based on the basic problems of foraging theory, including the patch-leaving problem, the diet selection problem, the central place foraging problem, and so forth (Stephens & Krebs, 1986). It also includes stopping problems and other classic optimization problems, such as the k-arm bandit problems, horizon problems, and change point detection problems (Pearson, Hayden, Raghavachari, & Platt, 2009;Wilson, Geana, White, Ludvig, & Cohen, 2014;Wilson, Nassar, & Gold, 2013). Indeed, it may also include variants of the intertemporal choice task in which the postreward delays are clearly cued (Pearson et al, 2010).…”

Section: Suggestions For Future Researchmentioning

confidence: 99%

Time discounting and time preference in animals: A critical review

2015

View full text Add to dashboard Cite

Animals are an important model for studies of impulsivity and self-control. Many studies have made use of the intertemporal choice task, which pits small rewards available sooner against larger rewards available later (typically several seconds), repeated over many trials. Preference for the sooner reward is often taken to indicate impulsivity and/or a failure of self-control. This review shows that very little evidence supports this assumption; on the contrary, ostensible discounting behavior may reflect a boundedly rational but not necessarily impulsive reward-maximizing strategy. Specifically, animals may discount weakly, or even adopt a long-term rate-maximizing strategy, but fail to fully incorporate postreward delays into their choices. This failure may reflect learning biases. Consequently, tasks that measure animal discounting may greatly overestimate the true discounting and may be confounded by processes unrelated to time preferences. If so, animals may be much more patient than is widely believed; human and animal intertemporal choices may reflect unrelated mental operations; and the shared hyperbolic shape of the human and animal discount curves, which is used to justify cross-species comparisons, may be coincidental. The discussion concludes with a consideration of alternative ways to measure self-control in animals.

show abstract

Humans use directed and random exploration to solve the explore–exploit dilemma.

Cited by 496 publications

References 30 publications

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration–Exploitation Trade-Offs

Transcranial Stimulation over Frontopolar Cortex Elucidates the Choice Attributes and Neural Mechanisms Used to Resolve Exploration–Exploitation Trade-Offs

A causal role for right frontopolar cortex in directed, but not random, exploration

Time discounting and time preference in animals: A critical review

Contact Info

Product

Resources

About