RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation

Kim, Youngeun; Choi, Seokeon; Lee, Hankyeol; Kim, Tae‐Kyung; Kim, Changick

doi:10.1109/wacv45572.2020.9093294

Cited by 8 publications

(7 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MAMP outperforms existing self-supervised methods, and is on par with some supervised methods trained with large amounts of annotated data. Notation: Video Colorization [34], RPM-Net [11], CycleTime [38], CorrFlow [13], MuG [19], UVC [14], MAST [12], OSVOS [1], RANet [39], OSVOS-S [21], GC [16], OSMN [42], SiamMask [37], OnAVOS [33], FEELVOS [32], AFB-URR [17], PReMVOS [20], STM [24], KMN [28], CFBI [43] Semi-supervised video object segmentation techniques fall into two categories: supervised and self-supervised. Supervised approaches [24,43] use the rich annotation information in training data to learn the model achieving great success in video object segmentation.…”

Section: Introductionmentioning

confidence: 99%

“…Comparison on DAVIS-2017 with other methods.MAMP outperforms existing self-supervised methods, and is on par with some supervised methods trained with large amounts of annotated data. Notation: Video Colorization[34], RPM-Net[11], CycleTime[38], CorrFlow[13], MuG[19], UVC[14], MAST[12], OSVOS[1], RANet[39], OSVOS-S[21], GC[16], OSMN[42], SiamMask[37], OnAVOS[33], FEELVOS[32], AFB-URR[17], PReMVOS[20], STM[24], KMN[28], CFBI[43]…”

mentioning

confidence: 99%

See 1 more Smart Citation

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Miao¹,

Bennamoun²,

Gao³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a self-supervised spatio-temporal matching method coined Motion-Aware Mask Propagation (MAMP) for semi-supervised video object segmentation. During training, MAMP leverages the frame reconstruction task to train the model without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to subsequent frames according to our motion-aware spatio-temporal matching module, also proposed in this paper. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves state-of-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e. 4.9% higher mean J &F on DAVIS-2017 and 4.85% higher mean J &F on the unseen categories of YouTube-VOS than the nearest competitor. Moreover, MAMP performs on par with many supervised video object segmentation methods. Our code is available at: https: //github.com/bo-miao/MAMP.

show abstract

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Miao¹,

Bennamoun²,

Gao³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…IDAM [32] integrates the iterative distance-aware similarity convolution module into the matching process, which can overcome the shortcoming of using the inner product to obtain pointwise similarity. RPM-Net [33] combines Sinkhorn's method with deep learning to build soft correspondence from mixed features, thereby enhancing robustness to noise. Soft correspondence can improve robustness, but they will lead to a decrease in registration accuracy.…”

Section: Learing-based Registration Methodsmentioning

confidence: 99%

GCMTN: Low-Overlap Point Cloud Registration Network Combining Dense Graph Convolution and Multilevel Interactive Transformer

Wang

Yang

2023

Remote Sensing

View full text Add to dashboard Cite

A single receptive field limits the expression of multilevel receptive field features in point cloud registration, leading to the pseudo-matching of objects with similar geometric structures in low-overlap scenes, which causes a significant degradation in registration performance. To handle this problem, a point cloud registration network that incorporates dense graph convolution and a mutilevel interaction Transformer (GCMTN) in pursuit of better registration performance in low-overlap scenes is proposed in this paper. In GCMTN, a dense graph feature aggregation module is designed for expanding the receptive field of points and fusing graph features at multiple scales. To make pointwise features more discriminative, a multilevel interaction Transformer module combining Multihead Offset Attention and Multihead Cross Attention is proposed to refine the internal features of the point cloud and perform feature interaction. To filter out the undesirable effects of outliers, an overlap prediction module containing overlap factor and matching factor is also proposed for determining the match ability of points and predicting the overlap region. The final rigid transformation parameters are generated based on the distribution of the overlap region. The proposed GCMTN was extensively verified on publicly available ModelNet and ModelLoNet, 3DMatch and 3DLoMatch, and odometryKITTI datasets and compared with recent methods. The experimental results demonstrate that GCMTN significantly improves the capability of feature extraction and achieves competitive registration performance in low-overlap scenes. Meanwhile, GCMTN has value and potential for application in practical remote sensing tasks.

show abstract

“…It has shown promising capacity on various downstream tasks as it does not require annotations and can better generalize (Vondrick et al 2018;Han, Xie, and Zisserman 2019;Li et al 2019;Kim, Cho, and Kweon 2019;Wang, Jiao, and Liu 2020;Tao, Wang, and Yamasaki 2020;Pan et al 2021). Many pretext tasks have been explored for self-supervised learning such as future frame prediction (Liu et al 2018), query frame reconstruction (Lai and Xie 2019;Kim et al 2020;Lai, Lu, and Xie 2020), patch re-localization (Wang, Jabri, and Efros 2019;Lu et al 2020), and motion statistics prediction (Wang et al 2019a).…”

Section: Related Workmentioning

confidence: 99%

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Miao¹,

Bennamoun²,

Gao³

et al. 2022

2022 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During inference, MAMP extracts high-resolution features from each frame to build a memory bank from the features as well as the predicted masks of selected past frames. MAMP then propagates the masks from the memory bank to subsequent frames according to our proposed motion-aware spatio-temporal matching module to handle fast motion and long-term matching scenarios. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves stateof-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e., 4.2% higher mean J &F on DAVIS-2017 and 4.85% higher mean J &F on the unseen categories of YouTube-VOS than the nearest competitor. Moreover, MAMP performs at par with many supervised video object segmentation methods.

show abstract

RPM-Net: Robust Pixel-Level Matching Networks for Self-Supervised Video Object Segmentation

Cited by 8 publications

References 31 publications

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

GCMTN: Low-Overlap Point Cloud Registration Network Combining Dense Graph Convolution and Multilevel Interactive Transformer

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

Contact Info

Product

Resources

About