Weakly Supervised Salient Object Detection With Spatiotemporal Cascade Neural Networks

Tang, Yi; Zou, Wenbin; Jin, Zhi; Chen, Yuhuan; Yang, Hua; Li, Xia

doi:10.1109/tcsvt.2018.2859773

Cited by 87 publications

(40 citation statements)

References 56 publications

(80 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the common concepts used in deep learning methods, firstly, the global framework for each method is described in 2.1, then the deep network in each method is analyzed in 2.2, and finally an overview of the categorization of methods is shown at a functional level in 2.3. As a matter of convenience, the describled methods, are denoted as SCOMd [12], NRF [13], DHSNet [14], OSVOS [15], NLDF [16], LMP [17], SFCN [18], SegFlow [19], LVO [20], WSS [21], SCNN [22], DSS [23], SPD [24], AFNet [25] and CPD [26].…”

Section: Classification Of the State-of-the-art Methodsmentioning

confidence: 99%

“…Moving objects attract large attention and thus can be regarded as salient objects in videos. As in the methods[12,22,18], we also use the 30 test videos with the provided ground truth. The DAVIS 2016 dataset[27] is a popular video dataset for video foreground segmentation.It is divided into two splits: the training (30 sequences) part used for training only and the validation (20 sequences) part for the inference.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Overview of deep-learning based methods for salient object detection in videos

Wang

Zhang

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

Video salient object detection is a challenging and important problem in computer vision domain. In recent years, deep-learning based methods have contributed to significant improvements in this domain. This paper provides an overview of recent developments in this domain and compares the corresponding methods up to date, including 1) classification of the state-of-the-art methods and their frameworks; 2) summary of the benchmark datasets and commonly used evaluation metrics; 3) experimental comparison of the performances of the state-of-the-art methods; 4) suggestions of some promising future works for unsolved challenges.

show abstract

Section: Classification Of the State-of-the-art Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

Overview of deep-learning based methods for salient object detection in videos

Wang

Zhang

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Different from the cascade structure in [48], we propose a two-stream deep network to estimate salient regions. The specific framework are shown in Figure 2, which consists of four components, two network streams without shared parameters, refinement of motion information and attentive module.…”

Section: The Proposed Approachmentioning

confidence: 99%

“…In the temporal stream, we treat the RGB frames and corresponding motion priors as the network input. The temporal prior is generated by [48]. Specifically, the superpixels of the optical flow map are firstly obtained by using [10].…”

Section: The Proposed Approachmentioning

confidence: 99%

“…Among them, there are ten state-of-the-arts in video scenes and nine approaches in still images. The video salient object detection includes cluster-based co-saliency method (CS) [11], space-time saliency detection (ST) [75], segmenting saliency detection (SS) [42], saliency-aware method (SA) [59], consistent gradient based saliency (CG) [61], video salient object detection via fully convolutional networks (SFCN) [62], multi-scale spatiotemporal network (MSST) for video saliency [49], spatiotemporal cascade network (SCNN) [48], flow guided recurrent neural encoder (FGRNE) [26] and pyramid dilated deeper convlstm (PDB) [45]. The image saliency models include recurrent fully convolutional network (RFCN) [55], deep contrast learning (DCL) [27], visual saliency on multi-scale deep features (MDF) [28], deep saliency multi-task (DSMT) [30], deeply supervised salient object detection (DSS) [16], deep hierarchical saliency network (DHSN) [32], aggregating multilevel convolutional features (Amulet) [71], saliency detection with image-level supervision (WSS) [54] and learning uncertain convolutional features (UCF) [72].…”

Section: Comparison To the State-of-the-art Saliency Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Video salient object detection via spatiotemporal attention neural networks

et al. 2020

Self Cite

View full text Add to dashboard Cite

Recently, deep convolutional neural networks have been widely introduced into image salient object detection and achieve good performance in this community. However, as the complexity of video scenes, video salient object detection with deep learning models is still a challenge. The specific difficulties come from two aspects. First of all, the deep networks on image saliency detection cannot capture robust motion cues in video sequences. Secondly, as for the spatiotemporal fusing features, the existing methods simply exploit element-wise addition or concatenation, which not fully explores the contextual information and complementary correlation, thus they cannot produce more robust spatiotemporal features. To address these issues, we propose a two-stream based spatiotemporal attention neural network (STAN) for video salient object detection. We amply extract the motion information in terms of long short term memory (LSTM) network and 3D convolutional operation from optical flow-based prior and video sequences. Moreover, an attentive module is designed to integrate the different types of spatiotemporal feature maps by learning the corresponding weights. Meanwhile, in order to generate sufficient pixel-wise annotated video frames, we manually generate lots of coarse labels, which are well utilized to train a robust saliency prediction network. Experiments on the widely used challenging datasets (e.g., FBMS and DAVIS) prove that the proposed STAN has competitive performances among salient object detection methods.

show abstract

A novel FCNs‐ConvLSTM network for video salient object detection

Huang

Liu

Lan

et al. 2021

Circuit Theory & Apps

View full text Add to dashboard Cite

Summary A video saliency detection model is proposed based on deep learning, which improves the existing fully convolutional network (FCN)‐based model by introducing a convolutional long short‐term memory (ConvLSTM) module. The ConvLSTM splits the input into two flows with two layers in each one. The two flows have different dilation rates that make them have different receptive fields, which enables the proposed model to perform better in depicting the contour of objects. The ConvLSTM module receive frames in order as input rather than unordered frames that FCN modules do, so the proposed model can learn both spatial and temporal information of video data. Considering the lack of manually labeled annotations in the dataset, augmentation technologies are used in training the model to expand the dataset, such as performing mirror transformation, introducing Gaussian noise and abandoning every other frame to simulate fast movement situation. The proposed FCNs‐ConvLSTM model is trained and evaluated on extensively used dataset, and the results demonstrate that it performs better on recall rate (0.52 to 0.64) with a similar level on precision rate (0.72) when threshold is 125 and it also gets an increase on maximum F‐measure (0.66 to 0.70), which indicates that the proposed model has better capacity in detecting moving object.

show abstract

Weakly Supervised Salient Object Detection With Spatiotemporal Cascade Neural Networks

Cited by 87 publications

References 56 publications

Overview of deep-learning based methods for salient object detection in videos

Overview of deep-learning based methods for salient object detection in videos

Video salient object detection via spatiotemporal attention neural networks

A novel FCNs‐ConvLSTM network for video salient object detection

Contact Info

Product

Resources

About