2022
DOI: 10.1007/978-3-031-19815-1_37
|View full text |Cite
|
Sign up to set email alerts
|

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
69
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 127 publications
(84 citation statements)
references
References 52 publications
0
69
0
Order By: Relevance
“…The experimental results show that videos of complex scenes make the current state-of-the-art VOS methods less pronounced, especially in terms of tracking objects that disappear for a while due to occlusions. For example, the  & performance of XMem [10] on DAVIS 2016 is 92.0% but drop to 57.6% on MOSE, the  & performance of DeAOT [11] on DAVIS 2016 is 92.9% but drop to 59.4% on MOSE, which consistently reveal the difficulties brought by complex scenes.…”
Section: Introductionmentioning
confidence: 87%
See 2 more Smart Citations
“…The experimental results show that videos of complex scenes make the current state-of-the-art VOS methods less pronounced, especially in terms of tracking objects that disappear for a while due to occlusions. For example, the  & performance of XMem [10] on DAVIS 2016 is 92.0% but drop to 57.6% on MOSE, the  & performance of DeAOT [11] on DAVIS 2016 is 92.9% but drop to 59.4% on MOSE, which consistently reveal the difficulties brought by complex scenes.…”
Section: Introductionmentioning
confidence: 87%
“…Annotators use the tool to load and preview videos and first-frame masks, annotate and visualize the segmentation masks in the subsequent frames, and save them. The annotation tool also has a built-in interactive object segmentation network XMem [10], to assist annotations in producing high-quality masks. To ensure the annotation quality under complex scenes, the annotators are required to clearly track the object that disappears and reappears due to heavy occlusions and crowd.…”
Section: Video Collection and Annotationmentioning
confidence: 99%
See 1 more Smart Citation
“…of the out-of-distribution (OOD) content for each frame, we aim to reconstruct the video with EG3D inversion and perform face editing. For each video, we label the first frame for M 1 , and use an off-the-shelf tracking algorithm [12] to propagate it to obtain other masks M's.…”
Section: Methodsmentioning
confidence: 99%
“…After that, we convert them to EG3D's 5-point landmarks and crop the face out of the input frame. For the segmentation masks, we manually label the first frame and then use an off-theshelf tracking algorithm [12] to get the masks for the rest of frames.…”
Section: Methodsmentioning
confidence: 99%