Benchmarking Gaze Prediction for Categorical Visual Search

Yang

et al. 2021

Sci Rep

Self Cite

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control – saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding $$\sim$$ ∼ 300,000 search fixations. We thoroughly characterize COCO-Search18, and benchmark it using three machine-learning methods: a ResNet50 object detector, a ResNet50 trained on fixation-density maps, and an inverse-reinforcement-learning model trained on behavioral search scanpaths. Models were also trained/tested on images transformed to approximate a foveated retina, a fundamental biological constraint. These models, each having a different reliance on behavioral training, collectively comprise the new state-of-the-art in predicting goal-directed search fixations. Our expectation is that future work using COCO-Search18 will far surpass these initial efforts, finding applications in domains ranging from human-computer interactive systems that can anticipate a person’s intent and render assistance to the potentially early identification of attention-related clinical disorders (ADHD, PTSD, phobia) based on deviation from neurotypical fixation behavior.

“…We considered two fovea-inspired states for model training (see 17,28 for details). In the first we used the method from Perry and Geisler 54 to compute a Retina-Transformed (ReT) image.…”

Section: State Comparisonmentioning

confidence: 99%

COCO-Search18 fixation dataset for predicting goal-directed attention control

Yang

et al. 2021

Sci Rep

Self Cite

“…This factor distinguishes the current model from most others in the behavioral literature on attention control [2][3][4] , and makes our approach more aligned with recent computational work. 5,6 Second, the goal-directed behavior that we study is categorical search, the visual search for any exemplar of a target-object category. [7][8][9] We adopt this paradigm because categorical search is the simplest (and therefore, best) goal-directed behavior to computationally model-there is a target-object goal and the task is to find it.…”

Section: Introductionmentioning

confidence: 99%

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning

Zelinsky

Neurons, Behavior, Data Analysis, and Theory

et al. 2021

Self Cite

Understanding how goal states control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO). We then used this behaviorally-annotated dataset and the machine learning method of Inverse-Reinforcement Learning (IRL) to learn target-specific reward functions and policies for these two target goals. Finally, we used these learned policies to predict the fixations of 60 new behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of kitchen scenes depicting both a microwave and a clock (thus controlling for differences in low-level image contrast). We found that the IRL model predicted behavioral search efficiency and fixation-density maps using multiple metrics. Moreover, reward maps from the IRL model revealed target-specific patterns that suggest, not just attention guidance by target features, but also guidance by scene context (e.g., fixations along walls in the search of clocks). Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control. Inverse-Reinforcement LearningIRL is an imitation-learning method from the machine-learning literature that learns, through observations of an expert, a reward function and policy for mimicking expert performance. We extend this framework to goal-directed behavior by assuming that the image locations fixated by searchers constitute the expert performance that the model learns to mimic. The specific IRL algorithm that we use is Generative Adversarial Imitation Learning (GAIL 10 ), which makes reward proportional to the model's ability to generate State-Action pairings that imitate observed State-Action pairings. Here, the Action is a shift of fixation location in a search image (the model's saccade), and the State is the search context (all the information available for use in the search task). The State includes, but is not limited to, the visual features extracted from an image and the learned visual

“…The training, validation, and test 629 images in COCO-Search18 are already freely available as 630 part of COCO 29 Figure S1 shows how COCO-Search18 compares to other large-scale datasets of search behavior. To our knowledge, there were only three such image datasets that were annotated with human search fixations 17,67,68 . ages that were ultimately selected we noticed that exemplars 921 in some categories were mislabeled, probably due to poor 922 rater agreement on that category.…”

mentioning

confidence: 99%

“…(A). Ranked target-category search efficiency[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], averaging over participants. Redder color indicates higher rank and harder search targets, bluer color indicates lower rank and easier search.…”

mentioning

confidence: 99%

“…Values are means over 10 participants, and error bars represent standard errors.A B (A). Target-absent data, ranked[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] by target category (columns) and averaged over participants, shown for multiple performance measures (rows). These include: response error, reaction time (RT), and number of fixations (NumFix).…”

mentioning

confidence: 99%

See 1 more Smart Citation

COCO-Search18: A Dataset for Predicting Goal-directed Attention Control

Yang

et al. 2020

Preprint

Self Cite

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control—saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding ∼300,000 search fixations. We thoroughly characterize COCO-Search18, and benchmark it using three machine-learning methods: a ResNet50 object detector, a ResNet50 trained on fixation-density maps, and an inverse-reinforcement-learning model trained on behavioral search scanpaths. Models were also trained/tested on images transformed to approximate a foveated retina, a fundamental biological constraint. These models, each having a different reliance on behavioral training, collectively comprise the new state-of-the-art in predicting goal-directed search fixations. Our expectation is that future work using COCO-Search18 will far surpass these initial efforts, finding applications in domains ranging from human-computer interactive systems that can anticipate a person’s intent and render assistance to the potentially early identification of attention-related clinical disorders (ADHD, PTSD, phobia) based on deviation from neurotypical fixation behavior.