Zahy Bnaya scite author profile

Sources of suboptimality in a minimalistic explore–exploit task

Song

¹

,

Bnaya

²

,

Ji

³

2019

View full text Add to dashboard Cite

Balancing exploration and exploitation is a fundamental aspect of decision-making. It remains unknown whether people are close to optimal in striking this balance, and if not, how exactly their behavior deviates from optimality. Many existing paradigms are not ideally suited to answer this question, as they contain complexities such as non-stationary environments, stochasticity under exploitation, and reward distributions that are unknown to participants. Here, we introduce a task without such complexities, in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behavior of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a novel temporal ratio law. Finally, we find evidence for "sequencelevel" variability which is shared across all decisions in the same sequence. Our results provide a new perspective on the explore-exploit dilemma, and emphasize the importance of examining sequencelevel strategies and their variability when studying sequential decision-making..

show abstract

Expertise increases planning depth in human gameplay

Opheusden

¹

,

Kuperwajs

²

,

Galbiati

³

et al. 2023

Nature

View full text Add to dashboard Cite

Do People Think Like Computers?

Opheusden

¹

,

Bnaya

²

,

Galbiati

³

et al. 2016

View full text Add to dashboard Cite

Bandit Algorithms for Social Network Queries

Bnaya

¹

,

Puzis

²

,

Stern

³

et al. 2013

View full text Add to dashboard Cite

Sources of suboptimality in a minimalistic explore-exploit task

Song

¹

,

Bnaya

²

,

Ji

³

2018

Preprint

View full text Add to dashboard Cite

Balancing exploration and exploitation is a fundamental aspect of decision-making. It remains unknown whether people are close to optimal in striking this balance, and if not, how exactly their behavior deviates from optimality. Many existing paradigms are not ideally suited to answer this question, as they contain complexities such as non-stationary environments, stochasticity under exploitation, and reward distributions that are unknown to participants. Here, we introduce a task without such complexities, in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behavior of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a novel temporal ratio law. Finally, we find evidence for "sequencelevel" variability which is shared across all decisions in the same sequence. Our results provide a new perspective on the explore-exploit dilemma, and emphasize the importance of examining sequencelevel strategies and their variability when studying sequential decision-making.

show abstract

Zahy Bnaya

Sources of suboptimality in a minimalistic explore–exploit task

Expertise increases planning depth in human gameplay

Do People Think Like Computers?

Bandit Algorithms for Social Network Queries

Sources of suboptimality in a minimalistic explore-exploit task

Contact Info

Product

Resources

About