2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01375
|View full text |Cite
|
Sign up to set email alerts
|

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 38 publications
0
10
0
Order By: Relevance
“…Earlier works have investigated the use of videos for weakly-, semi-, or un-supervised segmentation by leveraging motion or temporal consistency [22,52,53]. Most aforementioned approaches do not address the VIS problem, and use optical flow for frame-to-frame matching [25,33,44]. In particular, FlowIRN [33] explores VIS using only classification labels and incorporates optical flow to leverage mask consistency.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Earlier works have investigated the use of videos for weakly-, semi-, or un-supervised segmentation by leveraging motion or temporal consistency [22,52,53]. Most aforementioned approaches do not address the VIS problem, and use optical flow for frame-to-frame matching [25,33,44]. In particular, FlowIRN [33] explores VIS using only classification labels and incorporates optical flow to leverage mask consistency.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, our surrogate objective function not only promotes the one-to-k matched regions to reach the same mask probabilities, but also commits their mask prediction to a confident foreground or background prediction by entropy minimization. Unlike flow-based models [33,46], which assume one-to-one matching, our approach builds robust and flexible one-to-k correspondences to cope with e.g. occlusions and homogeneous regions, without introducing additional model parameters or inference cost.…”
Section: Introductionmentioning
confidence: 99%
“…Segmentation using motion Existing works utilize optical flow to approximate the motion of objects. For example, FlowIRN [37] uses class activation maps [2] and dense optical flow [36] to generate pseudo supervision. In Motion Grouping [69], dense optical flow [52,50] is used to cluster pixels into foreground and background.…”
Section: Related Workmentioning
confidence: 99%
“…Several recent approaches for the instance segmentation have also explored the use of temporal information by aggregating features across frames [10,11] or using 3D convolutions [12,13]. Previous work [14] attempted to use motion information for instance segmentation in a weakly supervised manner. However, it simply amplifies the foreground scores of the regions with large motion.…”
Section: Lower Left)mentioning
confidence: 99%
“…We conducted our experiments using the YouTube-VIS 2019 [15] benchmark dataset. Because the annotations of the original val split are not publicly available, we randomly selected ten videos per class from the original train split, following [14], to create train val split, and named the remaining data train train split. Finally, the train train split contains 1847 videos and 51049 images, while the train val split contains 391 videos and 10796 images.…”
Section: Dataset and Implementation Detailsmentioning
confidence: 99%