2019
DOI: 10.1007/978-3-030-20870-7_35
|View full text |Cite
|
Sign up to set email alerts
|

PReMVOS: Proposal-Generation, Refinement and Merging for Video Object Segmentation

Abstract: We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposalgeneration, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
272
0
9

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 230 publications
(281 citation statements)
references
References 31 publications
0
272
0
9
Order By: Relevance
“…Using better detections and better segmentations does benefit our method. Using [20] segmentations instead of [38], sMOTSA increases from 78.2 to 82.8 for cars and from 50.1 to 59.4 for pedestrians (4.6 and 9.3 percentage points). Using [29] detections instead of [38], sMOTSA increases another 2.9 percentage points from 82.8 to 85.7 for cars.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Using better detections and better segmentations does benefit our method. Using [20] segmentations instead of [38], sMOTSA increases from 78.2 to 82.8 for cars and from 50.1 to 59.4 for pedestrians (4.6 and 9.3 percentage points). Using [29] detections instead of [38], sMOTSA increases another 2.9 percentage points from 82.8 to 85.7 for cars.…”
Section: Methodsmentioning
confidence: 99%
“…We initially estimate a segmentation mask for each bounding box detection. We use a fully convolutional neural network from [20] which we call BB2SegNet. This crops and resizes an image region given by a bounding box to a 385 × 385 patch and outputs a segmentation mask for each box.…”
Section: Our Approachmentioning
confidence: 99%
“…Through combining an instance bounding proposals and coarse masks, we obtain the instancelevel mask for each primary object. Finally, to connect multiple instances across different frames, we use overlap ratio and optical flow as an association metric [38] to match different instance-level masks.…”
Section: Experimental Setup Datasets and Metricsmentioning
confidence: 99%
“…Semi-supervised VOS methods are provided with a pixelwise mask identifying the target object in the first frame of a video. When aiming at very high segmentation accuracy, methods generally perform online fine-tuning on the basis of this supervision [3,25,35,40,43,50,62], sometimes exploiting data-augmentation techniques [3,25] or self-supervision [62]. As online fine-tuning can take up to several minutes per video, many recently proposed methods renounce to it and instead aim at a faster online speed (e.g., [7,8,64]).…”
Section: Related Workmentioning
confidence: 99%
“…PReMVOS [40] 84.9 88.6 -OSVOS [3] 79.8 80.6 -MSK [50] 79.7 75.4 -PML [7] 75.5 79.3 -SFL [9] 76.1 76.0 -VPN [52] 70. 2 pruning.…”
Section: Comparison With the State Of The Artmentioning
confidence: 99%