2023
DOI: 10.3390/s23218802
|View full text |Cite
|
Sign up to set email alerts
|

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Shuaihui Wang,
Fengyi Jiang,
Boqian Xu

Abstract: Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transforme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 53 publications
0
0
0
Order By: Relevance
“…Inspired by FCNs [18], CNN-based methods [19][20][21][22][23][24][25] formulate the salient object detection as a pixel-level prediction task and generate the saliency map in an end-to-end manner. Recently, Transformer has also been applied to SOD [26][27][28][29][30]. They combine both Transformer and CNN to process multi-level features.…”
Section: Network Designs For Sodmentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by FCNs [18], CNN-based methods [19][20][21][22][23][24][25] formulate the salient object detection as a pixel-level prediction task and generate the saliency map in an end-to-end manner. Recently, Transformer has also been applied to SOD [26][27][28][29][30]. They combine both Transformer and CNN to process multi-level features.…”
Section: Network Designs For Sodmentioning
confidence: 99%
“…For instance, VST [30] considers the SOD task from the sequence-to-sequence perspective and designs a pure transformer model to handle the tokenized input. In addition, Transformer is also utilized to model the cross-modal information from the RGB and depth data [26,27].…”
Section: Network Designs For Sodmentioning
confidence: 99%