Context-aware network for RGB-D salient object detection

Liang, Fangfang; Duan, Lijuan; Ma, Wei; Qiao, Yuanhua; Miao, Jun; Ye, Qixiang

doi:10.1016/j.patcog.2020.107630

Cited by 15 publications

(3 citation statements)

References 69 publications

(96 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the flourish in this research direction, other inspiring techniques are also recently employed into the RGB-D SOD task, such as discrepant cross-modality interaction [65], triplet transformer embedding network [66], pure transformer network [67], neural architecture search [68], mutual information minimization [69], specificity-preserving architecture [70], hierarchical cross-modal distillation [21], cross-modal edgeguidance [71], LSTM-based context-aware modules [72]. A relatively complete survey on RGB-D SOD can be found in [22].…”

Section: General Rgb-d Sod Methodsmentioning

confidence: 99%

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

Zhang¹,

Fu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently CNN-based RGB-D salient object detection (SOD) has obtained significant improvement on detection accuracy. However, existing models often fail to perform well in terms of efficiency and accuracy simultaneously. This hinders their potential applications on mobile devices as well as many real-world problems. To bridge the accuracy gap between lightweight and large models for RGB-D SOD, in this paper, an efficient module that can greatly improve the accuracy but adds little computation is proposed. Inspired by the fact that depth quality is a key factor influencing the accuracy, we propose an efficient depth quality-inspired feature manipulation (DQFM) process, which can dynamically filter depth features according to depth quality. The proposed DQFM resorts to the alignment of low-level RGB and depth features, as well as holistic attention of the depth stream to explicitly control and enhance cross-modal fusion. We embed DQFM to obtain an efficient lightweight RGB-D SOD model called DFM-Net, where we in addition design a tailored depth backbone and a two-stage decoder as basic parts. Extensive experimental results on nine RGB-D datasets demonstrate that our DFM-Net outperforms recent efficient models, running at 20 FPS on CPU with only ∼8.5Mb model size, and meanwhile being 2.9/2.4 times faster and 6.7/3.1 times smaller than the latest best models A2dele and MobileSal. It also maintains state-of-the-art accuracy when even compared to non-efficient models. Interestingly, further statistics and analyses verify the ability of DQFM in distinguishing depth maps of various qualities without any quality labels. Last but not least, we further apply DFM-Net to deal with video SOD (VSOD), achieving comparable performance against recent efficient models while being 3/2.3 times faster/smaller than the prior best in this field. Our code is available at https://github.com/zwbx/DFM-Net.

show abstract

Section: General Rgb-d Sod Methodsmentioning

confidence: 99%

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

Zhang¹,

Fu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In [34], a saliency detection model is proposed based on the spatial position prior of attractive objects and sparse background features. Some approaches use neural networks and deep learning techniques such as convolution neural networks [35,36,37], sparse deep learning networks [38] Boltzmann machine [39], and ensemble deep neural network [40] for modeling bottom-up attention. In [41], a multiple convolution layers model is proposed to predict eye fixation which uses the end-to-end encoder-decoder network.…”

Section: Related Workmentioning

confidence: 99%

A Dynamic Bottom-Up Saliency Detection Method for Still Images

Sadeghi

Kamkar

Moghaddam

2022

Preprint

View full text Add to dashboard Cite

Introduction: Existing saliency detection algorithms in the literature have ignored the importance of time. They create a static saliency map for the whole recording time. However, bottom-up and top-down attention continuously compete and the salient regions change through time. In this paper, we propose an unsupervised algorithm to predict the dynamic evolution of bottom-up saliency in images. Method: We compute the variation of low-level features within non-overlapping patches of the input image. A patch with higher variation is considered more salient. We use a threshold to ignore less salient parts and create a map. A weighted sum of this map and its center of mass is calculated to provide the saliency map. The threshold and weights are set dynamically. We use the MIT1003 and DOVES datasets for evaluation and break the recording to multiple 100ms or 500ms-time intervals. A separate ground-truth is created for each interval. Then, the predicted dynamic saliency map is compared to the ground-truth using Normalized Scanpath Saliency, Kullback-Leibler divergence, Similarity, and Linear Correlation Coefficient metrics. Results: The proposed method outperformed the competitors on DOVES dataset. It also had an acceptable performance on MIT1003 especially within 0-400ms after stimulus onset. Conclusion: This dynamic algorithm can predict an image's salient regions better than the static methods as saliency detection is inherently a dynamic process. This method is biologically-plausible and in-line with the recent findings of the creation of a bottom-up saliency map in the primary visual cortex or superior colliculus.

show abstract

“…This overcame the interference of occlusion and dense crowds partly. Some researchers [ 24 , 25 ] used the symmetric dual-stream network to extract the RGB feature and the depth feature of the image simultaneously. However, it is difficult to acquire the high-quality RGB image feature and depth image feature simultaneously with the symmetric dual-stream network.…”

Section: Introductionmentioning

confidence: 99%

Multi-Object Tracking Algorithm for RGB-D Images Based on Asymmetric Dual Siamese Networks

Zhang

Yang

Xin

et al. 2020

Sensors

View full text Add to dashboard Cite

Currently, intelligent security systems are widely deployed in indoor buildings to ensure the safety of people in shopping malls, banks, train stations, and other indoor buildings. Multi-Object Tracking (MOT), as an important component of intelligent security systems, has received much attention from many researchers in recent years. However, existing multi-objective tracking algorithms still suffer from trajectory drift and interruption problems in crowded scenes, which cannot provide valuable data for managers. In order to solve the above problems, this paper proposes a Multi-Object Tracking algorithm for RGB-D images based on Asymmetric Dual Siamese networks (ADSiamMOT-RGBD). This algorithm combines appearance information from RGB images and target contour information from depth images. Furthermore, the attention module is applied to repress the redundant information in the combined features to overcome the trajectory drift problem. We also propose a trajectory analysis module, which analyzes whether the head movement trajectory is correct in combination with time-context information. It reduces the number of human error trajectories. The experimental results show that the proposed method in this paper has better tracking quality on the MICC, EPFL, and UMdatasets than the previous work.

show abstract

Context-aware network for RGB-D salient object detection

Cited by 15 publications

References 69 publications

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

A Dynamic Bottom-Up Saliency Detection Method for Still Images

Multi-Object Tracking Algorithm for RGB-D Images Based on Asymmetric Dual Siamese Networks

Contact Info

Product

Resources

About