2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00542
|View full text |Cite
|
Sign up to set email alerts
|

RVOS: End-To-End Recurrent Network for Video Object Segmentation

Abstract: Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
117
0
1

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 221 publications
(127 citation statements)
references
References 33 publications
0
117
0
1
Order By: Relevance
“…Test-dev set of DAVIS 17 . In Table 3 we report the performance comparison with the recent instance-level ZVOS method, RVOS [63], on the DAVIS 17 test-dev set. We can find that AGNN significantly outperforms RVOS over most evaluation criteria.…”
Section: Quantitative Performancementioning
confidence: 99%
“…Test-dev set of DAVIS 17 . In Table 3 we report the performance comparison with the recent instance-level ZVOS method, RVOS [63], on the DAVIS 17 test-dev set. We can find that AGNN significantly outperforms RVOS over most evaluation criteria.…”
Section: Quantitative Performancementioning
confidence: 99%
“…Differentiable Mask Matching For the feature extractor f θ , we use a COCO-pretrained Mask R-CNN with a ResNet-50-FPN backbone. We also try a ResNet-101 backbone of which the weights are initialized from the released model of RVOS [37]. We denote this model as DMM-Net+.…”
Section: Mask Proposal Generationmentioning
confidence: 99%
“…As PointNet is an effective architecture to extract features from unordered points [37], a PointNet layer is introduced to SFN for feature extraction. In addition, impelled by the success of RNN in video recognition [43][44][45][46], we utilize RNNs to fuse features among different scan lines to acquire 3D contexts. Specifically, RNNs in our SFN are implemented as two single-layer Long Short-Term Memory (LSTM) networks [47].…”
Section: Spatial Fusion Networkmentioning
confidence: 99%