2022 IEEE International Conference on Multimedia and Expo (ICME) 2022
DOI: 10.1109/icme52920.2022.9859966
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Abstract: We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to subsequent frames according to our proposed motion-aware spa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
0
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 85 publications
0
0
0
Order By: Relevance
“…Tan et al [30] propose a stepwise attention emphasis transformer for polyp segmentation, which combines convolutional layers with a transformer encoder to enhance both global and local feature extraction. Miao et al [31] utilizes a fast object motion tracker to predict regions of interest (ROIs) for the next frame, and proposes motion path memory to filter out redundant context by memorizing features within the motion path of objects between two frames. GFA [46] significantly improves the generalization capability of video object segmentation models by addressing both scene and semantic shifts, by leveraging frequency domain transformations and online feature updates.…”
Section: Video Object Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Tan et al [30] propose a stepwise attention emphasis transformer for polyp segmentation, which combines convolutional layers with a transformer encoder to enhance both global and local feature extraction. Miao et al [31] utilizes a fast object motion tracker to predict regions of interest (ROIs) for the next frame, and proposes motion path memory to filter out redundant context by memorizing features within the motion path of objects between two frames. GFA [46] significantly improves the generalization capability of video object segmentation models by addressing both scene and semantic shifts, by leveraging frequency domain transformations and online feature updates.…”
Section: Video Object Segmentationmentioning
confidence: 99%
“…While image-based segmentation techniques for glass and mirrors exist [12,24,25], their direct application to video sequences yields unsatisfactory results, making them impractical for real-world applications. Furthermore, conventional video object segmentation (VOS) methods without suitable video mirror and glass dataset fail to consider the unique motion characteristics associated with high-reflectivity materials like glass and mirrors [26][27][28][29][30][31]. Due to lack of video mirror and glass dataset and no consideration on reflective features, general deep-learning based VOS methods always tend to segment the reflections in mirrors or glass as real objects, resulting in inaccurate segmentation [32][33][34][35][36].…”
Section: Introductionmentioning
confidence: 99%
“…Incomplete shapes in 3D scans resulting from occlusion (both self-occlusion and occlusion by other objects) and low resolution of sensors often make them unsuitable for direct use in practical applications such as object grasping and Virtual Reality (VR) [16,9,25,10]. To remedy this, shape completion aims to recover complete 3D shapes from partial 3D scans.…”
Section: Introductionmentioning
confidence: 99%