2021
DOI: 10.1609/aaai.v35i3.16300
|View full text |Cite
|
Sign up to set email alerts
|

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Abstract: Weakly-supervised video object segmentation (WVOS) is an emerging video task that can track and segment the target given a simple bounding box label. However, existing WVOS methods are still unsatisfied in either speed or accuracy, since they only use the exemplar frame to guide the prediction while they neglect the reference from other frames. To solve the problem, we propose a novel Re-Aggregation based framework, which uses feature matching to efficiently find the target and capture the temporal dependencie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(2 citation statements)
references
References 34 publications
(50 reference statements)
0
2
0
Order By: Relevance
“…Since pixel-wise masks are hard to be obtained, some semi-supervised VOS works propose utilize bounding box as the first-frame clue to indicate the target object [35,36,37]. For example, SiamMask [35] applies a mask prediction branch on fully-convolutional Siamese object tracker to generate binary segmentation masks.…”
Section: Video Object Segmentation (Vos)mentioning
confidence: 99%
“…Since pixel-wise masks are hard to be obtained, some semi-supervised VOS works propose utilize bounding box as the first-frame clue to indicate the target object [35,36,37]. For example, SiamMask [35] applies a mask prediction branch on fully-convolutional Siamese object tracker to generate binary segmentation masks.…”
Section: Video Object Segmentation (Vos)mentioning
confidence: 99%
“…Zhao et al [67] proposed the first weakly supervised video salient object detection model based on "fixation guided scribble annotations". And some methods used weakly-supervised approaches to video object segmentation by fusing information between different frames [68]- [70]. In contrast, Zhou et al [71] relied only on the current frame image and the corresponding optical flow data to achieve the zero-shot video object segmentation.…”
Section: B Weakly Supervised Salient Object Detectionmentioning
confidence: 99%