Uncertainty and Exploration in a Restless Bandit Problem

Speekenbrink, Maarten; Konstantinidis, Emmanouil

doi:10.1111/tops.12145

Cited by 161 publications

(230 citation statements)

References 27 publications

Supporting

Mentioning

215

Contrasting

Order By: Relevance

“…Their behaviour was more in line with a contextblind learning strategy (Kalman filter) that treats the task as a restless bandit in which the expected rewards fluctuate over time but where these fluctuations are not predictable from changes in context. The combination of a Kalman filter learning model with a "probability of maximum utility" decision strategy that described these participants best has been found to describe participants behaviour well in an actual restless bandit task Speekenbrink and Konstantinidis (2015) and here might have indicated the limits of participants' learning ability in our task.…”

Section: Resultsmentioning

confidence: 99%

“…Frequently, these rules ignore the uncertainty about the formed expectations, while rationally, uncertainty should guide exploration. Here, we follow Speekenbrink and Konstantinidis (2015) and define a broader set of decision rules that explicitly model how participants trade off between expectations and uncertainty. We will consider 4 different strategies to make decisions in a CMAB task based on the predictive distributions derived from the above learning models.…”

Section: Decision Strategiesmentioning

confidence: 99%

See 1 more Smart Citation

Putting bandits into context: How function learning supports decision making.

Schulz¹,

Konstantinidis

Speekenbrink³

2018

Journal of Experimental Psychology: Learning, Memory, and Cogni

Self Cite

View full text Add to dashboard Cite

We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximise their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, we explore participants' behaviour in such learning environments. We predict participants' behaviour by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behaviour is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a "probability of improvement" decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favourite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is non-linear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Decision Strategiesmentioning

confidence: 99%

Putting bandits into context: How function learning supports decision making.

Schulz¹,

Konstantinidis

Speekenbrink³

2018

Journal of Experimental Psychology: Learning, Memory, and Cogni

Self Cite

View full text Add to dashboard Cite

show abstract

“…Choice in a changing environment when feedback is limited to the chosen options, coined Brestless bandits,^has also been studied theoretically (e.g., Whittle, 1988) and experimentally (e.g., Biele, Erev, & Ert, 2009;Speekenbrink & Kostantinidis, 2015), exposing the participants to two different sources of uncertainty. These studies attempted to find a model that best fitted behavior while testing assumptions about participants' risk aversion, their sensitivity to transition probabilities, and their exploration behavior.…”

Section: Literature Reviewmentioning

confidence: 99%

The dynamics of choice in a changing world: Effects of full and partial feedback

2016

View full text Add to dashboard Cite

We explored the dynamics of choice behavior while the values of the options changed, unannounced, several times. In particular, choice dynamics were compared when the outcome values of all available options were known (full feedback) and when the outcome value of only the chosen option was known (partial feedback). The frequency of change, the values of the options, and the difference between them were also manipulated. In an experiment with N = 427, we found that the patterns of choices were different for the two levels of feedback. Whereas behavior in the full-feedback condition showed a tendency to switch choices following a missed opportunity-replicating previous findings-the behavior in the partial-feedback condition was different. It was sensitive to the outcome value of the chosen option in comparison to some memory of the last-experienced outcome value of the unchosen option. However, the comparison of these two values influenced choice behavior only when the outcome of the currently chosen option was satisfactory and the last outcome of the unchosen one was not. As expected, the other manipulated variables (change frequency, the options' values, and the difference between them) had no effect on the dynamics of behavior.

show abstract

“…To that end, future work would benefit from the inclusion of computational modeling approaches which can provide insight into the interplay between memory biases, diminishing sensitivity, recency effects, and the possible differential weighting of extreme outcomes and extreme probabilities. Erev et al (2008) incorporated a utility function in their model of experience based choice (see also Ahn, Busemeyer, Wagenmakers, & Stout, 2008;Speekenbrink & Konstantinidis, 2015;Yechiam & Busemeyer, 2005), and noted that "the addition of diminishing sensitivity assumption to models that assume oversensitivity to small samples can improve the value of these models" (p. 587). Our results corroborate this statement by reinforcing the necessity to include utility functions in any future attempt to model specific effects in decisions from experience.…”

Section: Potential Explanations and Future Directionsmentioning

confidence: 99%

Magnitude and incentives: revisiting the overweighting of extreme events in risky decisions from experience

2017

Self Cite

View full text Add to dashboard Cite

Recent experimental evidence in experiencebased decision-making suggests that people are more risk seeking in the gains domain relative to the losses domain. This critical result is at odds with the standard reflection effect observed in description-based choice and explained by Prospect Theory. The so-called reversed-reflection effect has been predicated on the extreme-outcome rule, which suggests that memory biases affect risky choice from experience. To test the general plausibility of the rule, we conducted two experiments examining how the magnitude of prospective outcomes impacts risk preferences. We found that while the reversed-reflection effect was present with small-magnitude payoffs, using payoffs of larger magnitude brought participants' behavior back in line with the standard reflection effect. Our results suggest that risk preferences in experience-based decision-making are not only affected by the relative extremeness but also by the absolute extremeness of past events.

show abstract

Uncertainty and Exploration in a Restless Bandit Problem

Cited by 161 publications

References 27 publications

Putting bandits into context: How function learning supports decision making.

Putting bandits into context: How function learning supports decision making.

The dynamics of choice in a changing world: Effects of full and partial feedback

Magnitude and incentives: revisiting the overweighting of extreme events in risky decisions from experience

Contact Info

Product

Resources

About