Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Lin, Fanchao; Xie, Hongtao; Li, Yan; Zhang, Yongdong

doi:10.1609/aaai.v35i3.16300

Cited by 16 publications

(2 citation statements)

References 34 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since pixel-wise masks are hard to be obtained, some semi-supervised VOS works propose utilize bounding box as the first-frame clue to indicate the target object [35,36,37]. For example, SiamMask [35] applies a mask prediction branch on fully-convolutional Siamese object tracker to generate binary segmentation masks.…”

Section: Video Object Segmentation (Vos)mentioning

confidence: 99%

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

Ding¹,

Liu²,

He³

et al. 2023

Preprint

View full text Add to dashboard Cite

4 ByteDance https://henghuiding.github.io/MOSE Figure 1. Examples of video clips from the coMplex video Object SEgmentation (MOSE) dataset. The selected target objects are masked in orange ◼. The most notable feature of MOSE is complex scenes, including the disappearance-reappearance of objects, small/inconspicuous objects, heavy occlusions, crowded environments, etc. For example, the target player in the 2nd row turns around when reappearing in the 4th and 5th columns after disappearing in the 3rd column, bringing challenges in re-identifying him. Most videos in MOSE contain crowded and occluded objects with the target object seldom being the salient one. The goal of MOSE dataset is to provide a platform that promotes the development of more comprehensive and robust video object segmentation algorithms.

show abstract

Section: Video Object Segmentation (Vos)mentioning

confidence: 99%

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

Ding¹,

Liu²,

He³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Zhao et al [67] proposed the first weakly supervised video salient object detection model based on "fixation guided scribble annotations". And some methods used weakly-supervised approaches to video object segmentation by fusing information between different frames [68]- [70]. In contrast, Zhou et al [71] relied only on the current frame image and the corresponding optical flow data to achieve the zero-shot video object segmentation.…”

Section: B Weakly Supervised Salient Object Detectionmentioning

confidence: 99%