Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Liu, Qing; Ramanathan, Vignesh; Mahajan, Dhruv; Yuille, Alan; Yang, Zhenheng

doi:10.1109/cvpr46437.2021.01375

Cited by 11 publications

(10 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Earlier works have investigated the use of videos for weakly-, semi-, or un-supervised segmentation by leveraging motion or temporal consistency [22,52,53]. Most aforementioned approaches do not address the VIS problem, and use optical flow for frame-to-frame matching [25,33,44]. In particular, FlowIRN [33] explores VIS using only classification labels and incorporates optical flow to leverage mask consistency.…”

Section: Related Workmentioning

confidence: 99%

“…Specifically, our surrogate objective function not only promotes the one-to-k matched regions to reach the same mask probabilities, but also commits their mask prediction to a confident foreground or background prediction by entropy minimization. Unlike flow-based models [33,46], which assume one-to-one matching, our approach builds robust and flexible one-to-k correspondences to cope with e.g. occlusions and homogeneous regions, without introducing additional model parameters or inference cost.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Video Mask Transfiner for High-Quality Video Instance Segmentation

Ding

Danelljan

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Video Mask Transfiner for High-Quality Video Instance Segmentation

Ding

Danelljan

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Segmentation using motion Existing works utilize optical flow to approximate the motion of objects. For example, FlowIRN [37] uses class activation maps [2] and dense optical flow [36] to generate pseudo supervision. In Motion Grouping [69], dense optical flow [52,50] is used to cluster pixels into foreground and background.…”

Section: Related Workmentioning

confidence: 99%

Box Supervised Video Segmentation Proposal Network

Tanveer¹,

Koner²,

Kobold³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much attention. However, selfsupervised approaches pose a significant performance gap. Box-level annotations provide a balanced compromise between labeling effort and result quality for image segmentation but have not been exploited for the video domain. In this work, we propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties. Our method incorporates object motion in the following way: first, motion is computed using a bidirectional temporal difference and a novel bounding box-guided motion compensation. Second, we introduce a novel motion-aware affinity loss that encourages the network to predict positive pixel pairs if they share similar motion and color. The proposed method outperforms the stateof-the-art self-supervised benchmark by 16.4% and 6.9% J &F score and the majority of fully supervised methods on the DAVIS and Youtube-VOS dataset without imposing network architectural specifications. We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method. Code is available at https: //github.com/Tanveer81/BoxVOS.git

show abstract

“…Several recent approaches for the instance segmentation have also explored the use of temporal information by aggregating features across frames [10,11] or using 3D convolutions [12,13]. Previous work [14] attempted to use motion information for instance segmentation in a weakly supervised manner. However, it simply amplifies the foreground scores of the regions with large motion.…”

Section: Lower Left)mentioning

confidence: 99%

“…We conducted our experiments using the YouTube-VIS 2019 [15] benchmark dataset. Because the annotations of the original val split are not publicly available, we randomly selected ten videos per class from the original train split, following [14], to create train val split, and named the remaining data train train split. Finally, the train train split contains 1847 videos and 51049 images, while the train val split contains 391 videos and 10796 images.…”

Section: Dataset and Implementation Detailsmentioning

confidence: 99%

Weakly Supervised Instance Segmentation using Motion Information via Optical Flow

Ikeda¹,

Mori²

2022

Preprint

View full text Add to dashboard Cite

Weakly supervised instance segmentation has gained popularity because it reduces high annotation cost of pixel-level masks required for model training. Recent approaches for weakly supervised instance segmentation detect and segment objects using appearance information obtained from a static image. However, it poses the challenge of identifying objects with a non-discriminatory appearance. In this study, we address this problem by using motion information from image sequences. We propose a two-stream encoder that leverages appearance and motion features extracted from images and optical flows. Additionally, we propose a novel pairwise loss that considers both appearance and motion information to supervise segmentation. We conducted extensive evaluations on the YouTube-VIS 2019 benchmark dataset. Our results demonstrate that the proposed method improves the Average Precision of the state-of-the-art method by 3.1.

show abstract

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Cited by 11 publications

References 38 publications

Video Mask Transfiner for High-Quality Video Instance Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

Box Supervised Video Segmentation Proposal Network

Weakly Supervised Instance Segmentation using Motion Information via Optical Flow

Contact Info

Product

Resources

About