2011
DOI: 10.3758/s13420-011-0025-7
|View full text |Cite
|
Sign up to set email alerts
|

Pigeon and human performance in a multi-armed bandit task in response to changes in variable interval schedules

Abstract: The tension between exploitation of the best options and exploration of alternatives is a ubiquitous problem that all organisms face. To examine this trade-off across species, pigeons and people were trained on an eight-armed bandit task in which the options were rewarded on a variable interval (VI) schedule. At regular intervals, each option's VI changed, thus encouraging dynamic increases in exploration in response to these anticipated changes. Both species showed sensitivity to the payoffs that was often we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(18 citation statements)
references
References 42 publications
0
18
0
Order By: Relevance
“…In addition, another factor that leads to more exploration is the familiarity with the environment [18]. Moreover, in non-stationary versions of bandit task people do learn the changes in pay-off, but rate of learning is slower than stationary environments [5], and usually only increases when the changes in pay-offs are explicitly signaled to the participant [15]. In general, the behavioral work, and more recently neuropsychological work examining activations in brain regions associated with choice behavior in bandit tasks [13,19], has been successfully modeled by reinforcement learning models [2,20,21].…”
Section: Bandit Tasksmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, another factor that leads to more exploration is the familiarity with the environment [18]. Moreover, in non-stationary versions of bandit task people do learn the changes in pay-off, but rate of learning is slower than stationary environments [5], and usually only increases when the changes in pay-offs are explicitly signaled to the participant [15]. In general, the behavioral work, and more recently neuropsychological work examining activations in brain regions associated with choice behavior in bandit tasks [13,19], has been successfully modeled by reinforcement learning models [2,20,21].…”
Section: Bandit Tasksmentioning
confidence: 99%
“…For instance, these tasks have been extended to examine choice behavior with non-human participants (rats/pigeons) (e.g., [15]), and in non-stationary versions as well [16]. For both animals and humans, it appears that the amount of trials spent exploring is dependent on the payoff differentials [17], so that the better arm is found quicker (i.e., in fewer trials) the larger the differentials are.…”
Section: Bandit Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, response variation along the vertical axis of the screen would necessarily be meaningful with respect to reinforcement in a way that response variation along the horizontal axis would not. Second, kinesthetically speaking, it is likely easier for birds to freely vary their pecking in the horizontal dimension than in the vertical; pigeons naturally demonstrate restricted variability in the vertical dimension, and preferentially respond to lower regions (e.g., Racey, Young, Garlick, Pham, & Blaisdell, 2011) when pecking at a vertically oriented target.…”
Section: Methodsmentioning
confidence: 99%
“…In every round, each individual has to make a decision whether to exploit (i.e., choosing the option that has higher estimated reward value as of that round; see Methods section) or to explore (i.e., choosing the other option with lower estimated reward value). Because the MAB problem embeds the exploration-exploitation trade-off in its heart (Sutton and Barto, 1998), it is a suitable test bed for unambiguously measuring exploration behaviour (Daw et al, 2006; Keasar et al, 2002; Racey et al, 2011; Toyokawa et al, 2014). Fitting a reinforcement learning model to each participant’s decision data (O’Doherty et al, 2003), we quantified each participant’s asocial exploration tendency.…”
Section: Introductionmentioning
confidence: 99%