2021
DOI: 10.1038/s41598-021-87715-9
|View full text |Cite
|
Sign up to set email alerts
|

COCO-Search18 fixation dataset for predicting goal-directed attention control

Abstract: Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control – saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding $$\sim$$ … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 61 publications
(79 reference statements)
2
19
0
Order By: Relevance
“…In tasks like visual search, we expect there might be, for example, much more complicated interactions between scene content and scanpath history than the simple effects that we report here for free viewing. In the past, extending these models to other tasks was difficult due to the paucity of suitable large-scale datasets; we are hopeful for the future given recent datasets such as COCO-Search-18 ( Chen et al., 2021 ; Yang et al., 2020 ). We are already planning to utilize these new datasets to extend DeepGaze and model the influence of task on human scanpaths.…”
Section: Discussionmentioning
confidence: 99%
“…In tasks like visual search, we expect there might be, for example, much more complicated interactions between scene content and scanpath history than the simple effects that we report here for free viewing. In the past, extending these models to other tasks was difficult due to the paucity of suitable large-scale datasets; we are hopeful for the future given recent datasets such as COCO-Search-18 ( Chen et al., 2021 ; Yang et al., 2020 ). We are already planning to utilize these new datasets to extend DeepGaze and model the influence of task on human scanpaths.…”
Section: Discussionmentioning
confidence: 99%
“…After filtering out these object categories, which we did by using the corresponding MaskRCNN channels to detect these categories in the images, we were left with 145 images for analysis. Surprisingly few datasets have been developed for visual search behavior, but by far the largest is COCO-Search18 ( Chen, Yang, Ahn, Samaras, Hoai, & Zelinsky, 2021 ; Yang et al, 2020 ). It consists of roughly 300,000 fixations from 10 people searching for each of 18 target-object categories in 6202 images of natural scenes.…”
Section: Methodsmentioning
confidence: 99%
“…Using this bridge, the recent models that predict search fixations are all deep networks ( Wei, Adeli, Nguyen, Zelinsky, & Samaras, 2016 ; Adeli & Zelinsky, 2018 ; Zhang, Feng, Ma, Lim, Zhao, & Kreiman, 2018 ), with the current state of the art being a model that predicts search fixations using a prioritization policy learned through imitation of previously observed search behavior ( Yang et al, 2020 ). The dataset of search fixations that we use in the present study, COCO-Search18, was developed to provide the observations of search behavior needed by this model for training ( Chen et al, 2020 ). However, it is not our current goal to set a new benchmark by outperforming these models or even to enter into the arena of search-fixation prediction.…”
Section: Introductionmentioning
confidence: 99%
“…However, not only does this method require training multiple detectors for each target, it also requires each detector to be trained at different eccentricities, all using manually created datasets showing the targets at multiple scales against different textured backgrounds. This approach therefore cannot be easily extended to a larger number of target categories, as the 18 used in COCO-Search18 [7]. In contrast, our model is able to jointly learn the foveation process at feature level and the networks that predict human scanpath through back-propagation from human gaze behavior.…”
Section: Related Workmentioning
confidence: 99%
“…The mechanism of attention used by humans to prioritize and select visual information [36,35,34] has attracted the interest of computer vision researchers seeking to reproduce this selection efficiency in machines [43,7,44,6,37]. The most often used paradigm to study this efficiency is a visual search task, where efficiency is measured with respect to how many attention shifts (gaze fixations) are needed to detect a target in an image.…”
Section: Introductionmentioning
confidence: 99%