2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00403
|View full text |Cite
|
Sign up to set email alerts
|

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Abstract: In this paper, we propose the differentiable maskmatching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals at one time step as a linear assignment problem where the cost matrix is predicted by a CNN. We propose a differentiable matching layer by unrolling a projected gradient descent algorithm in which the proje… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
62
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 74 publications
(65 citation statements)
references
References 46 publications
0
62
0
Order By: Relevance
“…Since the propagation is conducted in a short-time interval, the methods often exploit the temporal smoothness constraint but are not robust to occlusion. The matching-based methods [41,13,52,43,14,50] predict a foreground mask in the current frame based on matching with previously predicted or given mask. Recently, STM [33] introduced a memorybased method for offline-learning VOS and demonstrated a significantly improved performance while achieving a fast run-time.…”
Section: Related Workmentioning
confidence: 99%
“…Since the propagation is conducted in a short-time interval, the methods often exploit the temporal smoothness constraint but are not robust to occlusion. The matching-based methods [41,13,52,43,14,50] predict a foreground mask in the current frame based on matching with previously predicted or given mask. Recently, STM [33] introduced a memorybased method for offline-learning VOS and demonstrated a significantly improved performance while achieving a fast run-time.…”
Section: Related Workmentioning
confidence: 99%
“…This flexible architecture can be seamlessly integrated with any bottom-up pose estimators in principle, and explicitly encodes human structural constraints. Hence, we formulate joint association as a differentiable matching problem [61], rather than relying on sophisticated post-processing [4,27] like conventional bottom-up methods. Though [25] also addresses joint association in an end-to-end manner, it needs to learn a complicated and heavy graph network and cannot guarantee optimality.…”
Section: Related Workmentioning
confidence: 99%
“…Existing pose estimators [4,39,27] instead employ heuristic greedy algorithms, but break the end-to-end pipeline. Inspired by [61], we propose a differentiable solution which facilitates model learning with direct matching based supervision.…”
Section: Fully Differentiable Human Keypoint Detection and Associationmentioning
confidence: 99%
“…OL & ED methods ED OL J % F% J &F% FPS STCNN [28] 58.7 64.6 61.7 0.26 † OnAVOS [25] 64.5 71.2 67.9 0.1 BoLTVOS [27] 72.0 80.6 76.3 0.69 TANDTM [9] 72.3 79.4 75.9 7.1 PReMVOS F [15] 73.9 81. OL & ED methods ED OL G% JS % JU % FS % FU % FPS MaskTrack [20] 53.1 59.9 45.0 59.5 47.9 0.05 OnAVOS [25] 55.2 60.1 46.6 62.7 51.4 0.05 DMM-Net [31] 58.0 60.3 50.6 63.5 57.4 -PReMVOS F [15] 66.9 71.4 56.5 75.9 63.7 0.17 BoLTVOS [27] 71 'OL' denotes online learning. Superscript 'F' denotes usage of optical flow.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%