Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow

Li, Jia; Zheng, Anlin; Chen, Xiaowu; Zhou, Bin

doi:10.1109/iccv.2017.158

Cited by 25 publications

(25 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the common concepts used in deep learning methods, firstly, the global framework for each method is described in 2.1, then the deep network in each method is analyzed in 2.2, and finally an overview of the categorization of methods is shown at a functional level in 2.3. As a matter of convenience, the describled methods, are denoted as SCOMd [12], NRF [13], DHSNet [14], OSVOS [15], NLDF [16], LMP [17], SFCN [18], SegFlow [19], LVO [20], WSS [21], SCNN [22], DSS [23], SPD [24], AFNet [25] and CPD [26].…”

Section: Classification Of the State-of-the-art Methodsmentioning

confidence: 99%

Overview of deep-learning based methods for salient object detection in videos

Wang

Zhang

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

Video salient object detection is a challenging and important problem in computer vision domain. In recent years, deep-learning based methods have contributed to significant improvements in this domain. This paper provides an overview of recent developments in this domain and compares the corresponding methods up to date, including 1) classification of the state-of-the-art methods and their frameworks; 2) summary of the benchmark datasets and commonly used evaluation metrics; 3) experimental comparison of the performances of the state-of-the-art methods; 4) suggestions of some promising future works for unsolved challenges.

show abstract

Section: Classification Of the State-of-the-art Methodsmentioning

confidence: 99%

Overview of deep-learning based methods for salient object detection in videos

Wang

Zhang

et al. 2020

Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Relation to Classical and Deep Learning Approaches: More traditional solutions rely on different heuristic assumptions and auxiliary tasks, such as the computation of motion boundaries [18] and "objectness" measure [20], [22]. Recent approaches propose a supervised combination between motion and appearance [23], [24].…”

Section: Scientific Contextmentioning

confidence: 99%

“…Another related work [32] builds salient motion masks, further combined with objectness to generate the final segmentation. Others [22], [33] combine saliency and optical flow by computing an average of the saliency masks based on direct optical flow connections. Motion video segmentation is tackled in [19], where different scene components are separated as different motion clusters.…”

Section: Scientific Contextmentioning

confidence: 99%

Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos

Haller¹,

Florea²,

Leordeanu³

2020

Preprint

View full text Add to dashboard Cite

We propose a dual system for unsupervised object segmentation in video, which brings together two modules with complementary properties: a space-time graph that discovers objects in videos and a deep network that learns powerful object features. The system uses an iterative knowledge exchange policy. A novel spectral space-time clustering process on the graph produces unsupervised segmentation masks passed to the network as pseudo-labels. The net learns to segment in single frames what the graph discovers in video and passes back to the graph strong image-level features that improve its node-level features in the next iteration. Knowledge is exchanged for several cycles until convergence. The graph has one node per each video pixel, but the object discovery is fast. It uses a novel power iteration algorithm computing the main space-time cluster as the principal eigenvector of a special Feature-Motion matrix without actually computing the matrix. The thorough experimental analysis validates our theoretical claims and proves the effectiveness of the cyclical knowledge exchange. We also perform experiments on the supervised scenario, incorporating features pretrained with human supervision. We achieve state-of-the-art level on unsupervised and supervised scenarios on four challenging datasets: DAVIS, SegTrack, YouTube-Objects, and DAVSOD.

show abstract

“…We first divide two frames and into and superpixels that are denoted as { } and , respectively. Similar to 11 , we compute the pair-wise ℓ distances between superpixels from { } and , where a superpixel is represented by its average RGB, Lab and HSV colors as well as the horizontal and vertical positions. Suppose that and reside in the nearest neighbors of each other, they are -nearest neighborhood reversible with the correspondence measured by…”

Section: Mask Refinementmentioning

confidence: 99%

Hierarchical Deep Cosegmentation of Primary Objects in Aerial Videos

Yuan

2019

IEEE MultiMedia

Self Cite

View full text Add to dashboard Cite

Primary object segmentation plays an important role in understanding videos generated by unmanned aerial vehicles. In this paper, we propose a large-scale dataset with 500 aerial videos and manually annotated primary objects. To the best of our knowledge, it is the largest dataset to date for primary object segmentation in aerial videos. From this dataset, we find most aerial videos contain large-scale scenes, small primary objects as well as consistently varying scales and viewpoints. Inspired by that, we propose a hierarchical deep co-segmentation approach that repeatedly divides a video into two sub-videos formed by the odd and even frames, respectively. In this manner, the primary objects shared by sub-videos can be cosegmented by training two-stream CNNs and finally refined within the neighborhood reversible flows. Experimental results show that our approach remarkably outperforms 17 state-of-the-art methods in segmenting primary objects in various types of aerial videos.Recently, unmanned aerial vehicles (drones) have become very popular since it provides a new way to observe and explore the world. As a result, aerial videos generated by drones have been growing explosively. For these videos, one of the key tasks is to segment the primary objects, which can be used to facilitate subsequent tasks such as event understanding, scene reconstruction, drone navigation and visual tracking.Hundreds of models have been proposed in the past decade to segment primary objects 15 , which can be roughly divided into two categories. The first category contains image-based models that focus on detecting salient (primary) objects in images. In this category, classic models 1-4 focus on designing rules to pop-out salient targets and suppress distractors, while recent models 5-8 usually adopt the deep learning framework due to the availability of large-scale image datasets (e.g., the XPIE dataset 4 ). The second category contains video-based models 16 that aim to segment a sequence of primary/foreground objects that consistently pop-out in the whole video. Similar to the image-based category, classic video-based models also design rules to segment primary objects by jointly considering the per-frame accuracy and inter-frame consistency 9 . Recently, with the presence of large-scale video datasets 17 , several deep learning models 10, 11 have been proposed as well. In addition, some video object co-segmentation approaches 12,13 have been proposed as well to simultaneously segment a common category of objects from two or more videos.

show abstract

Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow

Cited by 25 publications

References 30 publications

Overview of deep-learning based methods for salient object detection in videos

Overview of deep-learning based methods for salient object detection in videos

Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos

Hierarchical Deep Cosegmentation of Primary Objects in Aerial Videos

Contact Info

Product

Resources

About