2018
DOI: 10.1038/s41562-018-0467-4
|View full text |Cite|
|
Sign up to set email alerts
|

Generalization guides human exploration in vast decision spaces

Abstract: From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using a variety of bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, where the spatial correlation of rewards (in both ge… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

24
296
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 200 publications
(337 citation statements)
references
References 54 publications
24
296
1
Order By: Relevance
“…This model does not generalize over unseen arms at all, but rather only learns locally about the distribution of rewards for each option separately (Wu et al., in press). It can also be considered as a special case of the function learning model as the assumed correlation between points goes to zero.…”
Section: Models Of Learning and Decision Makingmentioning
confidence: 99%
“…This model does not generalize over unseen arms at all, but rather only learns locally about the distribution of rewards for each option separately (Wu et al., in press). It can also be considered as a special case of the function learning model as the assumed correlation between points goes to zero.…”
Section: Models Of Learning and Decision Makingmentioning
confidence: 99%
“…Yet, research has mostly focused on search problems where exploration and exploitation happen simultaneously (e.g. MAB: [15]; abstract search: [16]; Lévy processes: [17]; comparison of different paradigms: [18]) and/or where jumps in the solution space are allowed (e.g., correlated MAB: [19]; sampling paradigm: [10]; secretary problem: [20]; random sampling: [21]). Nevertheless, many search problems are characterised by separated exploration and exploitation phases and gradual exploration; examples include animals deciding where to hunt prey [22,23], algorithms maximising their reward in a reinforcement learning settings [24], and humans visually searching for a lost item [25], or solving a complex problem [14,5].…”
Section: Introductionmentioning
confidence: 99%
“…An alternative to backward induction is to assume that the decision maker acts myopically and decides whether or not to pursue the make‐or‐break task further as if this decision were the last that they could make. Myopic strategies accurately describe human behavior in a wide array of settings, ranging from dynamic investment decisions and sequential hypothesis testing to sequential search and multi‐armed bandit tasks (see also Busemeyer & Rapoport, ; Gabaix, Laibson, Moloche, & Weinberg, ; Stojic, Analytis, & Speekenbrink, ; Thaler, Tversky, Kahneman, & Schwartz, ; Wu, Schulz, Speekenbrink, Nelson, & Meder, ; Zhang & Yu, ). In our case, as more time is allocated to the make‐or‐break task, some of the associated uncertainty becomes replaced by an actual outcome y = q m ( t ′) experienced up until time t ′ [0, T ].…”
Section: Resultsmentioning
confidence: 99%