Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Wang, Shuaihui; Jiang, Fengyi; Xu, Boqian

doi:10.3390/s23218802

Cited by 1 publication

(2 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inspired by FCNs [18], CNN-based methods [19][20][21][22][23][24][25] formulate the salient object detection as a pixel-level prediction task and generate the saliency map in an end-to-end manner. Recently, Transformer has also been applied to SOD [26][27][28][29][30]. They combine both Transformer and CNN to process multi-level features.…”

Section: Network Designs For Sodmentioning

confidence: 99%

See 1 more Smart Citation

Self-Improved Learning for Salient Object Detection

Li,

Zeng,

Wang

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

Salient Object Detection (SOD) aims at identifying the most visually distinctive objects in a scene. However, learning a mapping directly from a raw image to its corresponding saliency map is still challenging. First, the binary annotations of SOD impede the model from learning the mapping smoothly. Second, the annotator’s preference introduces noisy labeling in the SOD datasets. Motivated by these, we propose a novel learning framework which consists of the Self-Improvement Training (SIT) strategy and the Augmentation-based Consistent Learning (ACL) scheme. SIT aims at reducing the learning difficulty, which provides smooth labels and improves the SOD model in a momentum-updating manner. Meanwhile, ACL focuses on improving the robustness of models by regularizing the consistency between raw images and their corresponding augmented images. Extensive experiments on five challenging benchmark datasets demonstrate that the proposed framework can play a plug-and-play role in various existing state-of-the-art SOD methods and improve their performances on multiple benchmarks without any architecture modification.

show abstract

Section: Network Designs For Sodmentioning

confidence: 99%

“…For instance, VST [30] considers the SOD task from the sequence-to-sequence perspective and designs a pure transformer model to handle the tokenized input. In addition, Transformer is also utilized to model the cross-modal information from the RGB and depth data [26,27].…”

Section: Network Designs For Sodmentioning

confidence: 99%