2015
DOI: 10.1111/tops.12145
|View full text |Cite
|
Sign up to set email alerts
|

Uncertainty and Exploration in a Restless Bandit Problem

Abstract: Decision-making in noisy and changing environments requires a fine balance between exploiting knowledge about good courses of action and exploring the environment in order to improve upon this knowledge. We present an experiment on a restless bandit task in which participants made repeated choices between options for which the average rewards changed over time. Comparing a number of computational models of participants' behaviour in this task, we find evidence that a substantial number of them balanced explora… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

11
215
4

Year Published

2015
2015
2022
2022

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 161 publications
(230 citation statements)
references
References 27 publications
11
215
4
Order By: Relevance
“…Their behaviour was more in line with a contextblind learning strategy (Kalman filter) that treats the task as a restless bandit in which the expected rewards fluctuate over time but where these fluctuations are not predictable from changes in context. The combination of a Kalman filter learning model with a "probability of maximum utility" decision strategy that described these participants best has been found to describe participants behaviour well in an actual restless bandit task Speekenbrink and Konstantinidis (2015) and here might have indicated the limits of participants' learning ability in our task.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Their behaviour was more in line with a contextblind learning strategy (Kalman filter) that treats the task as a restless bandit in which the expected rewards fluctuate over time but where these fluctuations are not predictable from changes in context. The combination of a Kalman filter learning model with a "probability of maximum utility" decision strategy that described these participants best has been found to describe participants behaviour well in an actual restless bandit task Speekenbrink and Konstantinidis (2015) and here might have indicated the limits of participants' learning ability in our task.…”
Section: Resultsmentioning
confidence: 99%
“…Frequently, these rules ignore the uncertainty about the formed expectations, while rationally, uncertainty should guide exploration. Here, we follow Speekenbrink and Konstantinidis (2015) and define a broader set of decision rules that explicitly model how participants trade off between expectations and uncertainty. We will consider 4 different strategies to make decisions in a CMAB task based on the predictive distributions derived from the above learning models.…”
Section: Decision Strategiesmentioning
confidence: 99%
“…Choice in a changing environment when feedback is limited to the chosen options, coined Brestless bandits,^has also been studied theoretically (e.g., Whittle, 1988) and experimentally (e.g., Biele, Erev, & Ert, 2009;Speekenbrink & Kostantinidis, 2015), exposing the participants to two different sources of uncertainty. These studies attempted to find a model that best fitted behavior while testing assumptions about participants' risk aversion, their sensitivity to transition probabilities, and their exploration behavior.…”
Section: Literature Reviewmentioning
confidence: 99%
“…To that end, future work would benefit from the inclusion of computational modeling approaches which can provide insight into the interplay between memory biases, diminishing sensitivity, recency effects, and the possible differential weighting of extreme outcomes and extreme probabilities. Erev et al (2008) incorporated a utility function in their model of experience based choice (see also Ahn, Busemeyer, Wagenmakers, & Stout, 2008;Speekenbrink & Konstantinidis, 2015;Yechiam & Busemeyer, 2005), and noted that "the addition of diminishing sensitivity assumption to models that assume oversensitivity to small samples can improve the value of these models" (p. 587). Our results corroborate this statement by reinforcing the necessity to include utility functions in any future attempt to model specific effects in decisions from experience.…”
Section: Potential Explanations and Future Directionsmentioning
confidence: 99%