TF-SOD: a novel transformer framework for salient object detection

Wang, Zhenyu; Zhang, Yunzhou; Liu, Yan; Wang, Zhuo; Coleman, Sonya; Kerr, Dermot

doi:10.1007/s00521-022-07069-9

Cited by 16 publications

(3 citation statements)

References 72 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, a pioneering study by Liu et al [50] presented a unified RGB and RGB-D SOD model based on a vision transformer achieving saliency and boundary detection by introducing task-specific labels. Wang et al [52] proposed a transformer architecture consisting of an FCN decoder and three additional modules to capture salient local and global information in RGB images. The interplay of information from different modalities facilitates the learning of deeper information representations by the network model.…”

Section: B Vision Transformermentioning

confidence: 99%

“…These works have demonstrated the necessity of replacing CNN with transformer architectures to explore global information in the ORSI-SOD. Moreover, various transformer variants have been developed by researchers for other domains of SOD, resulting in substantial advances in RGB SOD [52], RGB-D/T SOD [53], [54], and Video SOD (VSOD) [55]. However, transformers cannot extract local information as effectively as CNN in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images

Zhao,

Jia,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Excellent performance has been demonstrated by convolutional neural network (CNN) in salient object detection for optical remote sensing images (ORSI-SOD). However, the limitations of CNN's feature extraction using sliding window approach hinder the capture of global representations. Therefore, an end-to-end detection model, known as adaptive dual-stream sparse transformer network (ADSTNet), has been proposed for ORSI-SOD and is assisted by the vision transformer. It effectively addresses the compensation issue of global and local information in ORSI-SOD. In particular, an adaptive interaction encoder has been devised, amalgamating the multi-scale sparse transformer (MST) and the pyramid atrous attention (PAA) to constitute the adaptive dual-stream sparse encoder (ADSE). This encoder collaborates with the CNN to enhance longrange dependency modeling and preserve global information more effectively base on local features. Additionally, a directional feature reconfiguration (DFR) is constructed to extract texture details from multiple directional dimensions. Finally, we propose the adaptive feature cascade decoder (AFCD) that synthesizes content information from the foreground, edges, and background to enhance the representational capacity of the image. Furthermore, a structural loss function, known as the weight compensation mechanism, is introduced to balance the performance of boundary and salmap segmentation losses. The proposed model has been demonstrated to outperform 26 stateof-the-art (SOTA) ORSI-SOD methods across eight evaluation metrics on two standard datasets, as evidenced by extensive experiments. Furthermore, to verify its robustness, the generalization performance of the model on the latest challenging ORSI-4199 dataset is reported. The code and results for this work can be found at https://github.com/JieZzzoo/ADSTNet.

show abstract

Section: B Vision Transformermentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images

Zhao,

Jia,

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…In the field of computer vision, RGB-D Salient Object Detection (SOD) has progressively evolved into a significant research direction, playing a crucial role in numerous application domains. For instance, in robot navigation [1,2], salient object detection aids robots in gaining a more profound understanding of their environment, thereby informing their decision-making. In the realm of object tracking [3,4], salient object detection can effectively assist the system in accurately locating and tracing objects of interest.…”

Section: Introductionmentioning

confidence: 99%

RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion

Peng,

Zhai,

Feng

2024

IEEE Access

View full text Add to dashboard Cite

Existing RGB-D saliency detection models have not fully considered the differences between features at various levels, and lack an effective mechanism for cross-level feature fusion. This article proposes a novel cross-modality cross-level fusion learning framework. The framework mainly contains three modules: Attention Enhancement Module (AEM), Modality Feature Fusion Module (MFM), and Graph Reasoning Module (GRM). AEM is used to enhance the features of the two modalities. MFM is used to integrate the features of the two modalities to achieve cross-modality feature fusion. Subsequently, the modality fusion features are divided into high-level features and low-level features. The high-level features contain the semantic localization information of salient objects, and the low-level features contain the detailed information of salient objects. GRM extends the semantic localization information of salient objects in the high-level features from pixel features to the entire salient object area, thereby achieving cross-level feature fusion. This framework can effectively eliminate background noise and enhance the model's expressiveness. Extensive experiments were conducted on seven widely used datasets, and the results show that the new method outperforms nine current state-of-the-art RGB-D SOD methods.

show abstract

TransMCGC: a recast vision transformer for small-scale image classification tasks

Xiang

Chen

et al. 2023

Neural Comput & Applic

View full text Add to dashboard Cite

TF-SOD: a novel transformer framework for salient object detection

Cited by 16 publications

References 72 publications

Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images

Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images

RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion

TransMCGC: a recast vision transformer for small-scale image classification tasks

Contact Info

Product

Resources

About