2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019
DOI: 10.1109/iccvw.2019.00335
|View full text |Cite
|
Sign up to set email alerts
|

LIP: Learning Instance Propagation for Video Object Segmentation

Abstract: In recent years, the task of segmenting foreground objects from background in a video, i.e. video object segmentation (VOS), has received considerable attention. In this paper, we propose a single end-to-end trainable deep neural network, convolutional gated recurrent Mask-RCNN, for tackling the semi-supervised VOS task. We take advantage of both the instance segmentation network (Mask-RCNN) and the visual memory module (Conv-GRU) to tackle the VOS task. The instance segmentation network predicts masks for ins… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 51 publications
0
2
0
Order By: Relevance
“…However, the aforementioned methods with either 2D or 3D convolutions have limited temporal receptive fields and therefore cannot adequately capture variable temporal dependencies. On the other hand, a few recent works attempt to explicitly model temporal relationships and demonstrate promising results in several tasks, to name a few, temporal relational reasoning [40]- [44], object detection and tracking [6]- [9], event recognition [45]- [47], video segmentation [48]- [50], dynamic texture recognition [51], and spatiotemporal learning [52], [53].…”
Section: Introductionmentioning
confidence: 99%
“…However, the aforementioned methods with either 2D or 3D convolutions have limited temporal receptive fields and therefore cannot adequately capture variable temporal dependencies. On the other hand, a few recent works attempt to explicitly model temporal relationships and demonstrate promising results in several tasks, to name a few, temporal relational reasoning [40]- [44], object detection and tracking [6]- [9], event recognition [45]- [47], video segmentation [48]- [50], dynamic texture recognition [51], and spatiotemporal learning [52], [53].…”
Section: Introductionmentioning
confidence: 99%
“…However, the aforementioned methods with either 2D or 3D convolutions have limited temporal receptive fields and therefore cannot adequately capture variable temporal dependencies. On the other hand, a few recent works attempt to explicitly model temporal relationships and demonstrate promising results in several tasks, to name a few, temporal relational reasoning [40]- [44], object detection and tracking [6]- [9], event recognition [45]- [47], video segmentation [48]- [50], dynamic texture recognition [51], and spatiotemporal learning [52], [53].…”
Section: Introductionmentioning
confidence: 99%