2021
DOI: 10.51628/001c.22322
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning

Abstract: Understanding how goal states control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of Inverse-Reinforcement Learning (IRL) to learn t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…However, it is unlikely that humans uniformly collect information across all features of the representation. Depending on the task goal, some features might be more relevant for target detection than others (36, 37, 58, 59). For instance, paying attention to color may be more useful when searching for a piece of clothing than when trying to locate one’s car keys.…”
Section: Discussionmentioning
confidence: 99%
“…However, it is unlikely that humans uniformly collect information across all features of the representation. Depending on the task goal, some features might be more relevant for target detection than others (36, 37, 58, 59). For instance, paying attention to color may be more useful when searching for a piece of clothing than when trying to locate one’s car keys.…”
Section: Discussionmentioning
confidence: 99%
“…Prior work has tested how target features (Malcolm & Henderson, 2009; Navalpakkam & Itti, 2005; Vickery et al, 2005; Wolfe & Horowitz, 2017; Zelinsky, 2008), scene context (Castelhano & Witherspoon, 2016; Henderson et al, 1999; Neider & Zelinsky, 2006; Pereira & Castelhano, 2014, 2019) image salience (Anderson et al, 2015), and various combinations of these sources (Castelhano & Heaven, 2010; Ehinger et al, 2009; Malcolm & Henderson, 2010; Torralba et al, 2006; Wolfe & Horowitz, 2017; Zelinsky et al, 2006, 2020) influence eye movements during object search in scenes. However, because meaning and image salience are correlated (Elazary & Itti, 2008; Henderson, 2003; Henderson et al, 2007; Henderson & Hayes, 2017, 2018; Rehrig, Peacock, et al, 2020; Tatler et al, 2011), and because recent work has shown that attention prioritises task-neutral meaning over image salience during visual search for embedded letters in scenes (Hayes & Henderson, 2019), the current study tested whether this pattern of results would also hold during visual search for objects in scenes.…”
Section: Discussionmentioning
confidence: 99%
“…However, the process by which this occurs is unclear. Previous work testing the influence of scene information on attentional prioritisation during visual search has found influences of target features (Malcolm & Henderson, 2009; Navalpakkam & Itti, 2005; Vickery et al, 2005; Wolfe & Horowitz, 2017; Zelinsky, 2008), scene context (Castelhano & Witherspoon, 2016; Neider & Zelinsky, 2006; Pereira & Castelhano, 2014, 2019), image salience (Anderson et al, 2015), and various combinations thereof (Castelhano & Heaven, 2010; Ehinger et al, 2009; Malcolm & Henderson, 2010; Torralba et al, 2006; Wolfe & Horowitz, 2017; Zelinsky et al, 2006, 2020) on eye fixations. Although image salience predicts various behaviours during object search, such as fixation allocation (Henderson et al, 2007) and fast first saccades (Anderson et al, 2015), scene semantics influence search behaviours as well (Cornelissen & Võ, 2017; Hayes & Henderson, 2019; Henderson et al, 2007).…”
mentioning
confidence: 99%
“…Yang et al [43] approximated the foveated retina by using the segmentation maps of a full-resolution image and its blurred version, predicted by a pretrained Panoptic-FPN [21], to approximate high-resolution fovea and low-resolution peripheral, respectively. Like other models for predicting human attention [30,24,25,6,46], both approaches rely on pretrained networks to extract image features and train much smaller networks for the downstream tasks using transfer learning, usually due to the lack of human fixation data for training. Also noteworthy is that these approaches apply networks pretrained on full-resolution images (e.g., ResNets [14] trained on Im-ageNet [38]), expecting the pretrained networks to approximate how humans perceive blurred images.…”
Section: Introductionmentioning
confidence: 99%