2021
DOI: 10.1016/j.cognition.2020.104465
|View full text |Cite
|
Sign up to set email alerts
|

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

Abstract: Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic importance across an image, have recently been proposed to support the hypothesis that meaning rather than image features guide human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
33
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 24 publications
(39 citation statements)
references
References 37 publications
(49 reference statements)
2
33
0
Order By: Relevance
“…The correlation between such context-free meaning and visual salience is high 64 . Challenging the meaning map approach in its current form, results from a recent study suggest that meaning maps index the distribution of high-level visual features rather than meaning 66 . The larger problem is that meaning can be defined in many ways 60 .…”
Section: Discussionmentioning
confidence: 95%
“…The correlation between such context-free meaning and visual salience is high 64 . Challenging the meaning map approach in its current form, results from a recent study suggest that meaning maps index the distribution of high-level visual features rather than meaning 66 . The larger problem is that meaning can be defined in many ways 60 .…”
Section: Discussionmentioning
confidence: 95%
“…Note that the exact parameter values determining the grids used to segment images into patches differed slightly between the two types of meaning maps from our two studies. The reason for this difference is that the reports introducing the original (Henderson & Hayes, 2017) and contextualized (Peacock et al, 2019) meaning maps -on which we based our previous (Pedziwiatr et al, 2021a) and present studies, respectively -differ with respect to the reported sizes of images viewed by observers in the eye-tracking experiments (33 × 25 vs. 26.5 × 20 degrees of visual angle), yet use identical numbers of coarse and fine patches per image.…”
mentioning
confidence: 74%
“…A recent study evaluating the meaning map approach and comparing them to a wider range of saliency models highlights some limitations of the method (Pedziwiatr et al, 2021a; see Henderson et al, 2021 andPedziwiatr et al, 2021b for ongoing debate). First, the findings demonstrate that meaning maps are outperformed in predicting fixations by DeepGaze II (Kümmerer et al, 2016(Kümmerer et al, , 2017, a saliency model based on a deep neural network, that indexes high-level features rather than meaning.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Yarbus' original study demonstrates that participants will have different scan-paths for the same image, even while performing the same task, suggesting that low level information is not su cient to predict human gaze 4 . Recently, deep learning models of gaze-guidance have trained convolution neural networks on the gaze patterns of human subjects (REFs), and have demonstrated greater performance than salience or meaning models alone 19 . These approaches therefore indirectly incorporate both feed-forward scene statistics with the use of high-level image meaning that guided the xations of observers who supplied the training eye movements.…”
Section: Introductionmentioning
confidence: 99%