SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification

Jia, XingZhao; Dongye, Changlei; Peng, Yanjun

doi:10.1016/j.imavis.2022.104549

Cited by 12 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The CNN model possesses translation invariance and locality, which have been proven beneficial for extracting local spatial information. In addition, RGB data are generally more informative than depth data [48,49]. Therefore, we argue that it is unnecessary to use a large Transformer-based complex network like PVTv2 to process depth data.…”

Section: Encoder Of Depth Channelmentioning

confidence: 99%

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

Liu

Yang

Wang

et al. 2023

IET Computer Vision

View full text Add to dashboard Cite

Existing RGB‐D salient object detection methods mainly rely on a symmetric two‐stream Convolutional Neural Network (CNN)‐based network to extract RGB and depth channel features separately. However, there are two problems with the symmetric conventional network structure: first, the ability of CNN in learning global contexts is limited; second, the symmetric two‐stream structure ignores the inherent differences between modalities. In this study, a Transformer‐based asymmetric network is proposed to tackle the issues mentioned above. The authors employ the powerful feature extraction capability of Transformer to extract global semantic information from RGB data and design a lightweight CNN backbone to extract spatial structure information from depth data without pre‐training. The asymmetric hybrid encoder effectively reduces the number of parameters in the model while increasing speed without sacrificing performance. Then, a cross‐modal feature fusion module which enhances and fuses RGB and depth features with each other is designed. Finally, the authors add edge prediction as an auxiliary task and propose an edge enhancement module to generate sharper contours. Extensive experiments demonstrate that our method achieves superior performance over 14 state‐of‐the‐art RGB‐D methods on six public datasets. The code of the authors will be released at https://github.com/lc012463/TANet.

show abstract

Section: Encoder Of Depth Channelmentioning

confidence: 99%

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

Liu

Yang

Wang

et al. 2023

IET Computer Vision

View full text Add to dashboard Cite

show abstract

“…Recently, Jia et al [43] presented an all-in-one salient object detection model, which can process RGB SOD tasks, RGB-T SOD tasks, and RGB-D SOD tasks by using one model. Besides, in some other computer vision fields, modality unified frameworks have also aroused the interest of researchers and made impressive progress [44], [45].…”

Section: Modality Unified Sodmentioning

confidence: 99%

“…Recently, Jia et al [27] proposed an all-in-one SOD model, namely AiOSOD. This model can detect salient objects from three types of data (RGB, RGB-D, and RGB-T) by using one model with the same weight parameters.…”

Section: B Am Sodmentioning

confidence: 99%

Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection

Huang

Yang

Zhang

et al. 2022

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Multi-modal feature fusion and saliency reasoning are two core sub-tasks of RGB-D salient object detection. However, most existing models employ linear fusion strategies (e.g., concatenation) for multi-modal feature fusion and use a simple coarse-to-fine structure for saliency reasoning. Despite their simpleness, they can neither fully capture the cross-modal complementary information nor exploit the multi-level complementary information among the cross-modal features at different levels. To address these issues, a novel RGB-D salient object detection model is presented, where we pay special attention to the aforementioned two sub-tasks. Concretely, a multi-modal feature interaction module is first presented to explore more interactions between the unimodal RGB and depth features. It helps to capture their cross-modal complementary information by jointly using some simple linear fusion strategies and bilinear fusion ones. Then, a saliency prior information guided fusion module is presented to exploit the multi-level complementary information among the fused cross-modal features at different levels. Instead of employing a simple convolutional layer for the final saliency prediction, a saliency refinement and prediction module is designed to better exploit those extracted multilevel cross-modal information for RGB-D saliency detection. Experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed framework over some state-of-the-art methods.Index Terms-RGB-D salient object detection, bilinear fusion strategy, saliency prior information guided fusion, saliency refinement and prediction. [9] and segmentation [10]. Benefiting from the progress of Convolutional Neural Networks (CNNs), CNNs based RGB SOD models [2], [11], [12], [13] have significantly improved the performance of conventional hand-crafted feature based approaches [14], [15], [16], [17].However, such algorithms are found vulnerable to complex environments, varying illuminations or cluttered backgrounds. After paying a lot of efforts, researchers realize that using RGB images only cannot solve those challenges. Meanwhile,

show abstract

“…Identifying salient areas in an image can facilitate subsequent advanced visual tasks, enhancing efficiency and resource management and improving performance (Gupta et al, 2020). Thus, SOD can help filter irrelevant backgrounds, and SOD plays a significant pre-processing role in computer vision applications, providing important basic processing for these applications, e.g., segmentation (Donoser et al, 2009;Qin et al, 2014;Noh et al, 2015;Fu et al, 2017;Shelhamer et al, 2017), classification (Borji and Itti, 2011;Joseph et al, 2019;Akila et al, 2021;Liu et al, 2021;Jia et al, 2022;Ma and Yang, 2023), tracking (Frintrop and Kessel, 2009;Su et al, 2014;Ma et al, 2017;Lee and Kim, 2018;Chen et al, 2019), etc.…”

Section: Introductionmentioning

confidence: 99%

Salient object detection: a mini review

Wang,

Yu,

Lim

et al. 2024

Front. Signal Process.

View full text Add to dashboard Cite

This paper presents a mini-review of recent works in Salient Object Detection (SOD). First, We introduce SOD and its application in image processing tasks and applications. Following this, we discuss the conventional methods for SOD and present several recent works in this category. With the start of deep learning AI algorithms, SOD has also benefited from deep learning. Here, we present and discuss Deep learning-based SOD according to its training mechanism, i.e., fully supervised and weakly supervised. For the benefit of the readers, we have also included some standard data sets assembled for SOD research.

show abstract

SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification

Cited by 12 publications

References 15 publications

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

TANet: Transformer‐based asymmetric network for RGB‐D salient object detection

Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection

Salient object detection: a mini review

Contact Info

Product

Resources

About