HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

Wu, Zongwei; Allibert, Guillaume; Mériaudeau, Fabrice; Ma, Chao; Demonceaux, Cédric

doi:10.1109/tip.2023.3263111

Cited by 40 publications

(11 citation statements)

References 77 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most RGB-D SOD models adopt CNN-based networks to extract features and focus on cross-modal fusion strategies to improve salient object detection performance. Various frameworks and fusion strategies have been proposed to effectively merge crossmodal cross-scale features [14,17,[21][22][23]30,31]. Zhang et al [30] designed an asymmetric two-stream network, where a flow ladder module is introduced to the RGB stream to capture global context information and DepthNet for the depth stream.…”

Section: Related Workmentioning

confidence: 99%

“…Wu et al [21] proposed layer-wise, trident spatial, and attention mechanisms to fuse robust RGB and depth features against low-quality depths. Wu et al [23] employed a granularitybased attention module to leverage the details of salient objects and introduced a dualattention module to fuse the cross-modal cross-scale features in a coarse-to-fine manner.…”

Section: Related Workmentioning

confidence: 99%

“…Quantitative comparison: We compared the proposed network with 14 SOTA CNNbased methods and Transformer-based methods, which were CMW [13], JLDCF [50], HINet [51], DSA2F [20], CFIDNet [52], C 2 DFNet [53], SPSNet [19], AFNet [22], HiDANet [23], MTFormer [54], VST [43], TANet [35], and SwinNet [37]. The compared saliency maps were directly provided by the authors or generated via their released codes.…”

Section: Comparison With Sotasmentioning

confidence: 99%

“…However, it is still difficult to localize the salient object accurately in scenes such as those with low contrast or objects with a cluttered background. CNN-based RGB-D SOD models, which employ features from RGB images and depth maps, have attracted growing interest and presented promising performance [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. However, some issues still limit the performance of existing CNN-based RGB-D SOD models.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Wang,

Jiang,

2023

Sensors

View full text Add to dashboard Cite

Salient object detection (SOD), which is used to identify the most distinctive object in a given scene, plays an important role in computer vision tasks. Most existing RGB-D SOD methods employ a CNN-based network as the backbone to extract features from RGB and depth images; however, the inherent locality of a CNN-based network limits the performance of CNN-based methods. To tackle this issue, we propose a novel Swin Transformer-based edge guidance network (SwinEGNet) for RGB-D SOD in which the Swin Transformer is employed as a powerful feature extractor to capture the global context. An edge-guided cross-modal interaction module is proposed to effectively enhance and fuse features. In particular, we employed the Swin Transformer as the backbone to extract features from RGB images and depth maps. Then, we introduced the edge extraction module (EEM) to extract edge features and the depth enhancement module (DEM) to enhance depth features. Additionally, a cross-modal interaction module (CIM) was used to integrate cross-modal features from global and local contexts. Finally, we employed a cascaded decoder to refine the prediction map in a coarse-to-fine manner. Extensive experiments demonstrated that our SwinEGNet achieved the best performance on the LFSD, NLPR, DES, and NJU2K datasets and achieved comparable performance on the STEREO dataset compared to 14 state-of-the-art methods. Our model achieved better performance compared to SwinNet, with 88.4% parameters and 77.2% FLOPs. Our code will be publicly available.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Comparison With Sotasmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

Wang,

Jiang,

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…By incorporating depth information, RGB-D salient object detection can more effectively discern and highlight the salient regions amidst the challenging backdrop. Despite the important progress that RGB-D salient object detection has made [9][10][11][12], there are still some challenges in this field that need to be overcome.…”

Section: Introductionmentioning

confidence: 99%

RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion

Peng,

Zhai,

Feng

2024

IEEE Access

View full text Add to dashboard Cite

Existing RGB-D saliency detection models have not fully considered the differences between features at various levels, and lack an effective mechanism for cross-level feature fusion. This article proposes a novel cross-modality cross-level fusion learning framework. The framework mainly contains three modules: Attention Enhancement Module (AEM), Modality Feature Fusion Module (MFM), and Graph Reasoning Module (GRM). AEM is used to enhance the features of the two modalities. MFM is used to integrate the features of the two modalities to achieve cross-modality feature fusion. Subsequently, the modality fusion features are divided into high-level features and low-level features. The high-level features contain the semantic localization information of salient objects, and the low-level features contain the detailed information of salient objects. GRM extends the semantic localization information of salient objects in the high-level features from pixel features to the entire salient object area, thereby achieving cross-level feature fusion. This framework can effectively eliminate background noise and enhance the model's expressiveness. Extensive experiments were conducted on seven widely used datasets, and the results show that the new method outperforms nine current state-of-the-art RGB-D SOD methods.

show abstract