Unified Image and Video Saliency Modeling

Droste, Richard; Jiao, Jianbo; Noble, J. Alison

doi:10.1007/978-3-030-58558-7_25

Cited by 106 publications

(92 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While unsupervised domain adaptation has been applied to image classification (Ganin 2016;Tzeng et al 2017), face recognition (Kan et al 2015), object detection (Tang 2016), semantic segmentation (Zhang et al 2020) and video action recognition (Li et al 2018) (among others), our work is, to our knowledge, the first to deal with unsupervised domain adaptation on video saliency prediction. It is worthwhile to note that this is technically and fundamentally different from the form of domain adaptation proposed in UNISAL (Droste et al 2020), that, instead, learns domain-specific parameters. This means that, at inference time, UNISAL requires to know the source dataset of a given input in order to select domain-specific learned parameters.…”

Section: Related Workmentioning

confidence: 99%

“…It is also different from unsupervised salient object detection (Zhang et al 2018), which, instead, attempts to predict saliency by exploiting large unlabelled or weakly-labelled samples. However, we also provide HD 2 S with domain-specific learning capabilities as in (Droste et al 2020), showing how this mechanism improves performance but cannot be applied in unsupervised domain adaptation scenarios.…”

Section: Related Workmentioning

confidence: 99%

“…In certain multi-source training scenarios (e.g., as done in (Droste et al 2020)), one may assume that annotations are available for all employed datasets, thus enabling supervised training on all of them. When applying our saliency prediction model to this scenario, we provide it with domainspecific operations (Droste et al 2020), which address the domain shift among different datasets.…”

Section: Domain-specific Learningmentioning

confidence: 99%

See 2 more Smart Citations

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

et al. 2021

View full text Add to dashboard Cite

In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at https://github.com/perceivelab/hd2s.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Domain-specific Learningmentioning

confidence: 99%

See 1 more Smart Citation

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Fourthly, by employing an element-wise multiplication between S m V,C and S is further processed with the 1 × 1 convolution and resize-convolution operation to generate a high-level semantic-aware attention map F m semantic . 2) Center-Bias Prior: According to previous studies [28], [66], [67], human attention tends to concentrate on the center of scenes, which is termed as center-bias phenomenon. To this end, a learnable center-bias prior function is adopted according to our preceding work [40].…”

Section: Multi-cues Integrationmentioning

confidence: 99%

Bio-Inspired Representation Learning for Visual Attention Prediction

Yuan

Ning

2021

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Visual Attention Prediction (VAP) is a significant and imperative issue in the field of computer vision. Most of existing VAP methods are based on deep learning. However, they do not fully take advantage of the low-level contrast features while generating the visual attention map. In this paper, a novel VAP method is proposed to generate visual attention map via bioinspired representation learning. The bio-inspired representation learning combines both low-level contrast and high-level semantic features simultaneously, which are developed by the fact that human eye is sensitive to the patches with high contrast and objects with high semantics. The proposed method is composed of three main steps: 1) feature extraction, 2) bio-inspired representation learning and 3) visual attention map generation. Firstly, the high-level semantic feature is extracted from the refined VGG16, while the low-level contrast feature is extracted by the proposed contrast feature extraction block in a deep network. Secondly, during bio-inspired representation learning, both the extracted low-level contrast and high-level semantic features are combined by the designed densely connected block, which is proposed to concatenate various features scale by scale. Finally, the weightedfusion layer is exploited to generate the ultimate visual attention map based on the obtained representations after bio-inspired representation learning. Extensive experiments are performed to demonstrate the effectiveness of the proposed method.

show abstract

“…Salience in dynamic scenes is related to but conceptually different from salience in static images [ 27 ]. Specific methods for the dynamic case have been studied [ 28 , 29 , 30 , 31 , 32 , 33 ] and, very recently, unified image-video approaches [ 34 ] proposed, but only in the context of spatial salience. For gaze prediction, temporal features are found to be of key importance in rare events, so spatial static features can explain gaze in most cases [ 35 ].…”

Section: Related Workmentioning

confidence: 99%

Glimpse: A Gaze-Based Measure of Temporal Salience

Traver

Zorío

Leiva

2021

Sensors

View full text Add to dashboard Cite

Temporal salience considers how visual attention varies over time. Although visual salience has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored, despite arguably being of utmost importance to understand the temporal evolution of attention on dynamic contents. To address this gap, we proposed Glimpse, a novel measure to compute temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure is conceptually simple, training free, and provides a semantically meaningful quantification of visual attention over time. As an extension, we explored scoring algorithms to estimate temporal salience from spatial salience maps predicted with existing computational models. However, these approaches generally fall short when compared with our proposed gaze-based measure. Glimpse could serve as the basis for several downstream tasks such as segmentation or summarization of videos. Glimpse’s software and data are publicly available.

show abstract

Unified Image and Video Saliency Modeling

Cited by 106 publications

References 48 publications

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

Bio-Inspired Representation Learning for Visual Attention Prediction

Glimpse: A Gaze-Based Measure of Temporal Salience

Contact Info

Product

Resources

About