Tidying Deep Saliency Prediction Architectures

Reddy, Navyasri; Jain, Samyak; Yarlagadda, Pradeep; Gandhi, Vineet

doi:10.1109/iros45743.2020.9341574

Cited by 34 publications

(12 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the development of deep learning, different model structures have been proposed to improve the capabilities of feature representation. For example, (Wang and Shen 2018;Kümmerer et al 2017;Reddy et al 2020;Kroner et al 2020) explored the combination of multi-resolution features, (Cornia et al 2018) applied recurrent architecture and (Lou et al 2022) integrated transformer to refine the learnt features.…”

Section: Human Attention Predictionmentioning

confidence: 99%

“…Inspired by Reddy et al (2020); Jia and Bruce (2020), we adopt three most popular metrics, i.e., Kullback-Leibler Divergence (KLdiv), Linear Correlation Coefficient (CC) and Normalized Scanpath Saliency (NSS) to construct our loss function. Denote a predicted saliency map as P , the ground truth of human saliency map as Q, and the ground truth fixation map only contains binary values as F .…”

Section: P (Objmentioning

confidence: 99%

See 1 more Smart Citation

HSI: Human Saliency Imitator for Benchmarking Saliency-Based Model Explanations

Yang

Zheng

Deng

et al. 2022

HCOMP

View full text Add to dashboard Cite

Model explanations are generated by XAI (explainable AI) methods to help people understand and interpret machine learning models. To study XAI methods from the human perspective, we propose a human-based benchmark dataset, i.e., human saliency benchmark (HSB), for evaluating saliency-based XAI methods. Different from existing human saliency annotations where class-related features are manually and subjectively labeled, this benchmark collects more objective human attention on vision information with a precise eye-tracking device and a novel crowdsourcing experiment. Taking the labor cost of human experiment into consideration, we further explore the potential of utilizing a prediction model trained on HSB to mimic saliency annotating by humans. Hence, a dense prediction problem is formulated, and we propose an encoder-decoder architecture which combines multi-modal and multi-scale features to produce the human saliency maps. Accordingly, a pretraining-finetuning method is designed to address the model training problem. Finally, we arrive at a model trained on HSB named human saliency imitator (HSI). We show, through an extensive evaluation, that HSI can successfully predict human saliency on our HSB dataset, and the HSI-generated human saliency dataset on ImageNet showcases the ability of benchmarking XAI methods both qualitatively and quantitatively.

show abstract

Section: Human Attention Predictionmentioning

confidence: 99%

Section: P (Objmentioning

confidence: 99%

HSI: Human Saliency Imitator for Benchmarking Saliency-Based Model Explanations

Yang

Zheng

Deng

et al. 2022

HCOMP

View full text Add to dashboard Cite

show abstract

“…For CXR image saliency prediction, comparison was conducted with 3 state-of-the-art saliency prediction models, which are SimpleNet (Reddy et al, 2020), MSINet (Kroner et al, 2020) and VGGSSM (Cao et al, 2020). Saliency prediction using standard UNet (denoted as UNetS) is also included for reference.…”

Section: Benchmark Comparisonmentioning

confidence: 99%

Multi-task UNet: Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images

Zhu¹,

Rohling²,

Salcudean³

2022

Preprint

View full text Add to dashboard Cite

Human visual attention has recently shown its distinct capability in boosting machine learning models. However, studies that aim to facilitate medical tasks with human visual attention are still scarce. To support the use of visual attention, this paper describes a novel deep learning model for visual saliency prediction on chest X-ray (CXR) images. To cope with data deficiency, we exploit the multi-task learning method and tackles disease classification on CXR simultaneously. For a more robust training process, we propose a further optimized multi-task learning scheme to better handle model overfitting. Experiments show our proposed deep learning model with our new learning scheme can outperform existing methods dedicated either for saliency prediction or image classification. The code used in this paper is available at https://github.com/hz-zhu/MT-UNet.

show abstract

“…GT GazeGAN [93] SalGAN [94] UAVDVSM [95] SimpleNet [96] LSR+ Fig. 11: Visualization of predictions of the re-constructed benchmark COL models and our COL base model ("LSR+").…”

Section: Imagementioning

confidence: 99%

“…Discriminative region localization: We introduce the first camouflaged object discriminative region localization task. Considering the same ground truth acquisition process of our task and the widely studied eye fixation prediction task [50] (where both ground truth maps are obtained with eye trackers), we re-train existing eye fixation prediction models (GazeGAN [93], SalGAN [94], UAVDVSM [95] and SimpleNet [96]) with our camouflaged object localization training dataset and construct the first camouflaged object localization benchmark models in Table 2. The better performance of our COL model ("LSR+") compared with the benchmark models validate superiority of our solution.…”

Section: Performance Comparisonmentioning

confidence: 99%

Towards Deeper Understanding of Camouflaged Object Detection

Lv¹,

Zhang²,

Dai³

et al. 2022

Preprint

View full text Add to dashboard Cite

Preys in the wild evolve to be camouflaged to avoid being recognized by predators. In this way, camouflage acts as a key defence mechanism across species that is critical to survival. To detect and segment the whole scope of a camouflaged object, camouflaged object detection (COD) is introduced as a binary segmentation task, with the binary ground truth camouflage map indicating the exact regions of the camouflaged objects. In this paper, we revisit this task and argue that the binary segmentation setting fails to fully understand the concept of camouflage. We find that explicitly modeling the conspicuousness of camouflaged objects against their particular backgrounds can not only lead to a better understanding about camouflage, but also provide guidance to designing more sophisticated camouflage techniques. Furthermore, we observe that it is some specific parts of camouflaged objects that make them detectable by predators. With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment and rank camouflaged objects, indicating the conspicuousness level of camouflage. As no corresponding datasets exist for either the localization model or the ranking model, we generate localization maps with an eye tracker, which are then processed according to the instance level labels to generate our ranking-based training and testing dataset. We also contribute the largest COD testing set to comprehensively analyse performance of the camouflaged object detection models. Experimental results show that our triple-task learning framework achieves new state-of-the-art, leading to a more explainable camouflaged object detection network. Our code, data and results are available at: https://github.com/JingZhang617/COD-Rank-Localize-and-Segment.

show abstract

Tidying Deep Saliency Prediction Architectures

Cited by 34 publications

References 24 publications

HSI: Human Saliency Imitator for Benchmarking Saliency-Based Model Explanations

HSI: Human Saliency Imitator for Benchmarking Saliency-Based Model Explanations

Multi-task UNet: Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images

Towards Deeper Understanding of Camouflaged Object Detection

Contact Info

Product

Resources

About