Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

Pedziwiatr, Marek A.; Kümmerer, Matthias; Wallis, Thomas S. A.; Bethge, Matthias; Teufel, Christoph

doi:10.1016/j.cognition.2020.104465

Cited by 24 publications

(39 citation statements)

References 37 publications

(49 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The correlation between such context-free meaning and visual salience is high 64 . Challenging the meaning map approach in its current form, results from a recent study suggest that meaning maps index the distribution of high-level visual features rather than meaning 66 . The larger problem is that meaning can be defined in many ways 60 .…”

Section: Discussionmentioning

confidence: 95%

Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

Nuthmann

Schütz

Einhäuser

2020

Sci Rep

View full text Add to dashboard Cite

Whether fixation selection in real-world scenes is guided by image salience or by objects has been a matter of scientific debate. To contrast the two views, we compared effects of location-based and object-based visual salience in young and older (65 + years) adults. Generalized linear mixed models were used to assess the unique contribution of salience to fixation selection in scenes. When analysing fixation guidance without recurrence to objects, visual salience predicted whether image patches were fixated or not. This effect was reduced for the elderly, replicating an earlier finding. When using objects as the unit of analysis, we found that highly salient objects were more frequently selected for fixation than objects with low visual salience. Interestingly, this effect was larger for older adults. We also analysed where viewers fixate within objects, once they are selected. A preferred viewing location close to the centre of the object was found for both age groups. The results support the view that objects are important units of saccadic selection. Reconciling the salience view with the object view, we suggest that visual salience contributes to prioritization among objects. Moreover, the data point towards an increasing relevance of object-bound information with increasing age.

show abstract

Section: Discussionmentioning

confidence: 95%

Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

Nuthmann

Schütz

Einhäuser

2020

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Note that the exact parameter values determining the grids used to segment images into patches differed slightly between the two types of meaning maps from our two studies. The reason for this difference is that the reports introducing the original (Henderson & Hayes, 2017) and contextualized (Peacock et al, 2019) meaning maps -on which we based our previous (Pedziwiatr et al, 2021a) and present studies, respectively -differ with respect to the reported sizes of images viewed by observers in the eye-tracking experiments (33 × 25 vs. 26.5 × 20 degrees of visual angle), yet use identical numbers of coarse and fine patches per image.…”

mentioning

confidence: 74%

“…A recent study evaluating the meaning map approach and comparing them to a wider range of saliency models highlights some limitations of the method (Pedziwiatr et al, 2021a; see Henderson et al, 2021 andPedziwiatr et al, 2021b for ongoing debate). First, the findings demonstrate that meaning maps are outperformed in predicting fixations by DeepGaze II (Kümmerer et al, 2016(Kümmerer et al, , 2017, a saliency model based on a deep neural network, that indexes high-level features rather than meaning.…”

Section: Introductionmentioning

confidence: 99%

“…One limitation of all saliency-based approaches is their difficulty to account for factors in oculomotor control that are not image-computable (Bayat et al, 2018;Bruce et al, 2015;Henderson & Hayes, 2017;Pedziwiatr et al, 2021a;Tatler et al, 2011). For example, the fixationpatterns of individuals viewing the same stimulus can vary as a function of their task and goals (Hoppe & Rothkopf, 2019;Koehler et al, 2014;Rothkopf et al, 2016;Yarbus, 1967).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Semantic object-scene inconsistencies affect eye movements, but not in the way predicted by contextualized meaning maps

Pedziwiatr

Kümmerer

Wallis³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Semantic information is important in eye-movement control. An important semantic influence on gaze guidance relates to object-scene relationships: objects that are semantically inconsistent with the scene attract more fixations than consistent objects. One interpretation of this effect is that fixations are driven towards inconsistent objects because they are semantically more informative. We tested this explanation using contextualized meaning maps, a method that is based on crowd-sourced ratings to quantify the spatial distribution of context-sensitive ‘meaning’ in images. In Experiment 1, we compared gaze data and contextualized meaning maps for images, in which objects-scene consistency was manipulated. Observers fixated more on inconsistent vs. consistent objects. However, contextualized meaning maps did not assigned higher meaning to image regions that contained semantic inconsistencies. In Experiment 2, a large number of raters evaluated the meaningfulness of a set of carefully selected image-regions. The results suggest that the same scene locations were experienced as slightly less meaningful when they contained inconsistent compared to consistent objects. In summary, we demonstrated that – in the context of our rating task – semantically inconsistent objects are experienced as less meaningful than their consistent counterparts, and that contextualized meaning maps do not capture prototypical influences of image meaning on gaze guidance.

show abstract

“…Yarbus' original study demonstrates that participants will have different scan-paths for the same image, even while performing the same task, suggesting that low level information is not su cient to predict human gaze 4 . Recently, deep learning models of gaze-guidance have trained convolution neural networks on the gaze patterns of human subjects (REFs), and have demonstrated greater performance than salience or meaning models alone 19 . These approaches therefore indirectly incorporate both feed-forward scene statistics with the use of high-level image meaning that guided the xations of observers who supplied the training eye movements.…”

Section: Introductionmentioning

confidence: 99%

Does Cognitive Load Affect Eye Movements/Oculomotor Behavior in Natural Scenes?

Walter¹,

Bex²

2021

Preprint

View full text Add to dashboard Cite

Cognitive neuroscience researchers have identified relationships between cognitive load and eye movement behavior that are consistent with oculomotor biomarkers for neurological disorders. We develop an adaptive visual search paradigm that manipulates task difficulty and examine the effect of cognitive load on oculomotor behavior in healthy young adults. Participants (N=30) free-viewed a sequence of 100 natural scenes for 10 seconds each, while their eye movements were recorded. After each image, participants completed a 4 alternative forced choice task in which they selected a target object from the previously viewed scene, among 3 distracters of the same object type but from alternate scenes. Following two correct responses, the target object was selected from an image increasingly farther back (N-back) in the image stream; following an incorrect response, N decreased by 1. N-back thus quantifies and individualizes cognitive load. The results show that response latencies increased as N-back increased, and pupil diameter increased with N-back, before decreasing at very high N-back. These findings are consistent with previous studies and confirm that this paradigm was successful in actively engaging working memory, and successfully adapts task difficulty to individual subject’s skill levels. We hypothesized that oculomotor behavior would covary with cognitive load. However, there were no significant differences between the number or duration of fixations and saccades for high/low performing subjects, or between high/low performing trials for a given subject. Similarly, oculomotor behavior did not act as a predictor of correct/incorrect responses with increasing demand from the N-back task. Similarly, the proportion of each scene viewed was not related to N-back and was not a significant predictor of accuracy. These results suggest that cognitive load can be tracked with an adaptive visual search task, but that oculomotor strategies generally do not change as a result of greater cognitive demand in healthy adults.

show abstract

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

Cited by 24 publications

References 37 publications

Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

Salience-based object prioritization during active viewing of naturalistic scenes in young and older adults

Semantic object-scene inconsistencies affect eye movements, but not in the way predicted by contextualized meaning maps

Does Cognitive Load Affect Eye Movements/Oculomotor Behavior in Natural Scenes?

Contact Info

Product

Resources

About