2022
DOI: 10.1007/s00521-022-07069-9
|View full text |Cite
|
Sign up to set email alerts
|

TF-SOD: a novel transformer framework for salient object detection

Abstract: Most of existing saliency object detection 1 models are based on fully convolutional networks (FCNs), 2 which learn multi-scale/level semantic information through 3 convolutional layers to obtain high-quality predicted 4 saliency maps. However, convolution is locally interac-5 tive, and thus it is difficult to capture remote depen-6 dencies. Additionally, FCNs-based methods suffer from 7 coarse object boundaries. In this paper, to solve these 8 problems, we propose a novel transformer framework 9 for salient o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 72 publications
0
1
0
Order By: Relevance
“…For instance, a pioneering study by Liu et al [50] presented a unified RGB and RGB-D SOD model based on a vision transformer achieving saliency and boundary detection by introducing task-specific labels. Wang et al [52] proposed a transformer architecture consisting of an FCN decoder and three additional modules to capture salient local and global information in RGB images. The interplay of information from different modalities facilitates the learning of deeper information representations by the network model.…”
Section: B Vision Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, a pioneering study by Liu et al [50] presented a unified RGB and RGB-D SOD model based on a vision transformer achieving saliency and boundary detection by introducing task-specific labels. Wang et al [52] proposed a transformer architecture consisting of an FCN decoder and three additional modules to capture salient local and global information in RGB images. The interplay of information from different modalities facilitates the learning of deeper information representations by the network model.…”
Section: B Vision Transformermentioning
confidence: 99%
“…These works have demonstrated the necessity of replacing CNN with transformer architectures to explore global information in the ORSI-SOD. Moreover, various transformer variants have been developed by researchers for other domains of SOD, resulting in substantial advances in RGB SOD [52], RGB-D/T SOD [53], [54], and Video SOD (VSOD) [55]. However, transformers cannot extract local information as effectively as CNN in Fig.…”
Section: Introductionmentioning
confidence: 99%
“…In the field of computer vision, RGB-D Salient Object Detection (SOD) has progressively evolved into a significant research direction, playing a crucial role in numerous application domains. For instance, in robot navigation [1,2], salient object detection aids robots in gaining a more profound understanding of their environment, thereby informing their decision-making. In the realm of object tracking [3,4], salient object detection can effectively assist the system in accurately locating and tracing objects of interest.…”
Section: Introductionmentioning
confidence: 99%