CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion

Han, Junwei; Chen, Hao; Liu, Nian; Yan, Chenggang; Li, Xuelong

doi:10.1109/tcyb.2017.2761775

Cited by 354 publications

(257 citation statements)

References 41 publications

Supporting

Mentioning

256

Contrasting

Unclassified

Order By: Relevance

“…Table II and Fig. 4 show that all deep learning based approaches outperform traditional methods by a great margin; and endto-end frameworks, including PCA [19] and our approach, are superior to multi-stage methods such as CTMF [17] and MPCI [18]. Moreover, benefited from our fusion scheme and edge-preserving loss, the proposed method consistently improves the F-measure and MAE achieved by PCA on all three datasets, especially on NLPR where accurate depth data are collected by Kinect.…”

Section: Comparison With the State-of-the-artsmentioning

confidence: 90%

“…For a fair comparison to state-of-the-arts, we utilize the same data split as in [17]. The training set contains 1400 samples from the NJUD dataset and 650 samples from NLPR.…”

Section: A Datasetsmentioning

confidence: 99%

“…As done in previous works [17]- [19], β 2 is set to be 0.3 for emphasizing the importance of precision. We compare two kinds of F-measure scores, which are the maximum Fmeasure and the mean F-measure, respectively.…”

Section: B Evaluation Metricsmentioning

confidence: 99%

See 2 more Smart Citations

Adaptive Fusion for RGB-D Salient Object Detection

2019

View full text Add to dashboard Cite

RGB-D salient object detection aims to identify the most visually distinctive objects in a pair of color and depth images. Based upon an observation that most of the salient objects may stand out at least in one modality, this paper proposes an adaptive fusion scheme to fuse saliency predictions generated from two modalities. Specifically, we design a twostreamed convolutional neural network (CNN), each of which extracts features and predicts a saliency map from either RGB or depth modality. Then, a saliency fusion module learns a switch map that is used to adaptively fuse the predicted saliency maps. A loss function composed of saliency supervision, switch map supervision, and edge-preserving constraints is designed to make full supervision, and the entire network is trained in an end-toend manner. Benefited from the adaptive fusion strategy and the edge-preserving constraint, our approach outperforms state-ofthe-art methods on three publicly available datasets.

show abstract

Section: Comparison With the State-of-the-artsmentioning

confidence: 90%

“…For a fair comparison to state-of-the-arts, we utilize the same data split as in [17]. The training set contains 1400 samples from the NJUD dataset and 650 samples from NLPR.…”

Section: A Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Fusion for RGB-D Salient Object Detection

2019

View full text Add to dashboard Cite

show abstract

“…For the experiments on the Caltech-101 dataset, the procedures of l 2 normalization (the step (5) in the training phase and the steps (3) and (6) in the testing phase in Algorithm 1) are adopted. For the experiments on the ILSVRC2012 dataset, these procedures are skipped, since the original softmax classifier of the base CNN is not trained over the l 2 normalized DFVs.…”

Section: Methodsmentioning

confidence: 99%

Boosting Occluded Image Classification via Subspace Decomposition-Based Estimation of Deep Features

Cen

Wang

2020

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Classification of partially occluded images is a highly challenging computer vision problem even for the cutting edge deep learning technologies. To achieve a robust image classification for occluded images, this paper proposes a novel scheme using subspace decomposition based estimation (SDBE). The proposed SDBE-based classification scheme first employs a base convolutional neural network to extract the deep feature vector (DFV) and then utilizes the SDBE to compute the DFV of the original occlusion-free image for classification. The SDBE is performed by projecting the DFV of the occluded image onto the linear span of a class dictionary (CD) along the linear span of an occlusion error dictionary (OED). The CD and OED are constructed respectively by concatenating the DFVs of a training set and the occlusion error vectors of an extra set of image pairs. Two implementations of the SDBE are studied in this paper: the l 1 -norm and the squared l 2 -norm regularized least-squares estimates. By employing the ResNet-152, pre-trained on the ILSVRC2012 training set, as the base network, the proposed SBDE-based classification scheme is extensively evaluated on the Caltech-101 and ILSVRC2012 datasets. Extensive experimental results demonstrate that the proposed SDBE-based scheme dramatically boosts the classification accuracy for occluded images, and achieves around 22.25% increase in classification accuracy under 20% occlusion on the ILSVRC2012 dataset.

show abstract

“…The critical goal for video synchronization is to establish temporal correspondences among frames of two input videos, i.e., a reference video and a video to be synchronized. The applications of video synchronization cover a wide range of video anal-ysis tasks [2][3][4][5][6][7][8], such as video surveillance, target identification, human action recognition, saliency detection and fusion.…”

Section: Introductionmentioning

confidence: 99%

Video Synchronization Based on Projective-Invariant Descriptor

Zhang

Yang

et al. 2018

Neural Process Lett

View full text Add to dashboard Cite

In this paper, we present a novel trajectory-based method to synchronize two videos shooting the same dynamic scene, which are recorded by stationary un-calibrated cameras from different viewpoints. The core algorithm is carried out in two steps: projective-invariant descriptor construction and trajectory points matching. In the first step, a new five-coplanar-points structure is proposed to compute the cross ratio during the construction of the projective-invariant descriptor. The five points include one trajectory point and four fixed points induced from the background scene, which are co-planar in the 3D coordinate. In the second step, the matched trajectory points are initially estimated by the primitive nearest neighbor method, and are further refined by using epipolar geometric constraints and post processing. Experimental results demonstrate that the proposed method significantly outperforms the existing state-of-the-arts. More importantly, the proposed method is more generic in the sense that it works well for those videos captured under different conditions, including different frame rates, wide baseline, multiple moving objects, planar or non-planar motion trajectories.

show abstract

CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion

Cited by 354 publications

References 41 publications

Adaptive Fusion for RGB-D Salient Object Detection

Adaptive Fusion for RGB-D Salient Object Detection

Boosting Occluded Image Classification via Subspace Decomposition-Based Estimation of Deep Features

Video Synchronization Based on Projective-Invariant Descriptor

Contact Info

Product

Resources

About