2020
DOI: 10.1016/j.neunet.2020.05.004
|View full text |Cite
|
Sign up to set email alerts
|

Contextual encoder–decoder network for visual saliency prediction

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
86
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 155 publications
(92 citation statements)
references
References 37 publications
3
86
0
1
Order By: Relevance
“…All three models did comparable jobs of predicting whether a scene region would be fixated or not (MSI-Net=0.82, DeepGaze II=0.83, , SAM-ResNet=0.81). Taken together our results replicate previous findings [36][37][38] and establish that MSI-Net, DeepGaze II, and SAM-ResNet also predict scene attention well in active viewing tasks.…”
Section: Resultssupporting
confidence: 90%
See 1 more Smart Citation
“…All three models did comparable jobs of predicting whether a scene region would be fixated or not (MSI-Net=0.82, DeepGaze II=0.83, , SAM-ResNet=0.81). Taken together our results replicate previous findings [36][37][38] and establish that MSI-Net, DeepGaze II, and SAM-ResNet also predict scene attention well in active viewing tasks.…”
Section: Resultssupporting
confidence: 90%
“…We compared 3 of the best performing deep saliency models on the MIT saliency benchmark 1 : the multi-scale information network (MSI-Net) 36 , DeepGaze II 37 , and the saliency attentive model (SAM-ResNet) 38 . Each deep saliency model takes an image as input and produces a predicted saliency map as output.…”
Section: Deep Saliency Modelsmentioning
confidence: 99%
“…Along the same lines, a recently developed network has accurately predicted visual saliency using a similar encoderdecoder network ([Kroner et al, 2020]). Since we know the human visual system's coding of saliency must be robust to occlusion and clutter, it would be interesting to compare network objectives of predicting occluded visual features and predicting saliency.…”
Section: Future Directionsmentioning
confidence: 95%
“…Then, we describe how the models interact with each other. The visual system is composed of the segmentation model of Francis et al (2017), a retina model inspired by Ambrosano et al (2016), and a saliency model, which is a simplified version of the model introduced by Kroner et al (2019). These specific parts of human vision were chosen because the segmentation model already explains many features of visual crowding (Figure 4) and because retinal processing, as well as saliency computation, are potential sources of anisotropy for the segmentation output.…”
Section: Methodsmentioning
confidence: 99%
“…The model is an encoder-decoder network that learned a non-linear mapping from raw images to topographic fixation maps. It constitutes a simplified version of the model introduced by Kroner et al (2019), pruning the contextual layers to achieve computationally more efficient image processing. The VGG16 architecture (Simonyan and Zisserman, 2014), pre-trained on a visual classification task, serves as the model backbone to detect high-level features in the input space.…”
Section: Methodsmentioning
confidence: 99%