Video Object Segmentation Without Temporal Information

Maninis, Kevis-Kokitsi; Caelles, Sergi; Chen, Yuhua; Pont-Tuset, Jordi; Leal-Taixé, Laura; Cremers, Daniel; Gool, Luc Van

doi:10.48550/arxiv.1709.06031

Cited by 9 publications

(15 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Without these enhancements, our performance is still higher. This result demonstrates the ro-J Mean F Mean Overall Validation Set OSVOS [3] 56.6 63.9 60.3 PReMVOS [27] 73.9 81.7 77.8 OSVOS s [28] 64.7 71.3 68.0 OSMN [48] 52.5 57.1 54.8 VideoMatch [17] 56.5 68.2 62.4 RGMP [45] 64.8 68.6 66.7 A-Game [20] 67.2 72.7 70.0 FAVOS [7] 54 bustness and generalization of our approach on a complex dataset.…”

Section: Compare With the State-of-the-art Methodsmentioning

confidence: 63%

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Li¹,

Zhang²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper studies the problem of semi-supervised video object segmentation(VOS). Multiple works have shown that memory-based approaches can be effective for video object segmentation. They are mostly based on pixel-level matching, both spatially and temporally. The main shortcoming of memory-based approaches is that they do not take into account the sequential order among frames and do not exploit object-level knowledge from the target. To address this limitation, we propose to Learn position and target Consistency framework for Memory-based video object segmentation, termed as LCM. It applies the memory mechanism to retrieve pixels globally, and meanwhile learns position consistency for more reliable segmentation. The learned location response promotes a better discrimination between target and distractors. Besides, LCM introduces an object-level relationship from the target to maintain target consistency, making LCM more robust to error drifting. Experiments show that our LCM achieves state-of-the-art performance on both DAVIS and Youtube-VOS benchmark. And we rank the 1st in the DAVIS 2020 challenge semisupervised VOS task.

show abstract

Section: Compare With the State-of-the-art Methodsmentioning

confidence: 63%

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Li¹,

Zhang²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Video object segmentation is gaining rapid development in computer vision. Most solutions are fundamentally supervised, as they rely on heavily pretrained models with human-labeled annotations [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. Although manual annotation is extremely costly, there are very few genuine unsupervised methods [17], [18], [19], [20], [21].…”

Section: Scientific Contextmentioning

confidence: 99%

Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos

Haller¹,

Florea²,

Leordeanu³

2020

Preprint

View full text Add to dashboard Cite

We propose a dual system for unsupervised object segmentation in video, which brings together two modules with complementary properties: a space-time graph that discovers objects in videos and a deep network that learns powerful object features. The system uses an iterative knowledge exchange policy. A novel spectral space-time clustering process on the graph produces unsupervised segmentation masks passed to the network as pseudo-labels. The net learns to segment in single frames what the graph discovers in video and passes back to the graph strong image-level features that improve its node-level features in the next iteration. Knowledge is exchanged for several cycles until convergence. The graph has one node per each video pixel, but the object discovery is fast. It uses a novel power iteration algorithm computing the main space-time cluster as the principal eigenvector of a special Feature-Motion matrix without actually computing the matrix. The thorough experimental analysis validates our theoretical claims and proves the effectiveness of the cyclical knowledge exchange. We also perform experiments on the supervised scenario, incorporating features pretrained with human supervision. We achieve state-of-the-art level on unsupervised and supervised scenarios on four challenging datasets: DAVIS, SegTrack, YouTube-Objects, and DAVSOD.

show abstract

“…Given the manual foreground/background annotations for the first frame in a video clip, semi-supervised VOS methods segment the foreground object along the remaining frames. Deep learning based methods have achieved excellent performance [53,8,25,61,58,60], and static image segmentation [5,44,38,22,23] is utilized to perform video object segmentation without any temporal information. MaskTrack [44] considers the output of the previous frame as a guidance in the next frame to refine the mask.…”

Section: Semi-supervised Video Object Segmentationmentioning

confidence: 99%

“…MaskTrack [44] considers the output of the previous frame as a guidance in the next frame to refine the mask. OSVOS [5] processes each frame independently by finetuning on the first frame, and OSVOS-S [38] further transfers instance-level semantic information learned on ImageNet [12] to produce more accurate results. OnAVOS [53] proposes online finetuning with the predicted frames to further optimize the inference network.…”

Section: Semi-supervised Video Object Segmentationmentioning

confidence: 99%

Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation

Wang

Choi

Chen

et al. 2018

Preprint

View full text Add to dashboard Cite

Unsupervised video object segmentation is a crucial application in video analysis without knowing any prior information about the objects. It becomes tremendously challenging when multiple objects occur and interact in a given video clip. In this paper, a novel unsupervised video object segmentation approach via distractor-aware online adaptation (DOA) is proposed. DOA models spatial-temporal consistency in video sequences by capturing background dependencies from adjacent frames. Instance proposals are generated by the instance segmentation network for each frame and then selected by motion information as hard negatives if they exist and positives. To adopt high-quality hard negatives, the block matching algorithm is then applied to preceding frames to track the associated hard negatives. General negatives are also introduced in case that there are no hard negatives in the sequence and experiments demonstrate both kinds of negatives (distractors) are complementary. Finally, we conduct DOA using the positive, negative, and hard negative masks to update the foreground/background segmentation. The proposed approach achieves state-of-the-art results on two benchmark datasets, DAVIS 2016 and FBMS-59 datasets.

show abstract

Video Object Segmentation Without Temporal Information

Cited by 9 publications

References 0 publications

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos

Unsupervised Video Object Segmentation with Distractor-Aware Online Adaptation

Contact Info

Product

Resources

About