2021
DOI: 10.48550/arxiv.2102.01558
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Occluded Video Instance Segmentation: A Benchmark

Abstract: Can our video understanding systems perceive objects when a heavy occlusion exists in a scene?To answer this question, we collect a large scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and associat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(42 citation statements)
references
References 40 publications
(79 reference statements)
0
42
0
Order By: Relevance
“…Recently, more challenging benchmarks such as OVIS [51] and YouTube-VIS-2021 [71] are proposed to further promote the advancement of this field. CrossVIS is evaluated on three VIS benchmarks and shows competitive performances.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, more challenging benchmarks such as OVIS [51] and YouTube-VIS-2021 [71] are proposed to further promote the advancement of this field. CrossVIS is evaluated on three VIS benchmarks and shows competitive performances.…”
Section: Related Workmentioning
confidence: 99%
“…Instance Segmentation in Videos: Multi-instance Segmentation in Videos has recently emerged as a popular field due to its applicability in autonomous driving and robotics. Some of the popular tasks in this domain are Video Object Segmentation (VOS) [6,29], Video Instance Segmentation (VIS) [45], and the more recent Occluded Video Instance Segmentation (OVIS) [32]. Here the primary goal is to segment all object instances in a video and associate them over time.…”
Section: Related Workmentioning
confidence: 99%
“…OVIS. Occluded Video Instance Segmentation [32] comprises 5,233 videos with labeled masks for 25 known object classes. The dataset is similar to YouTube-VIS in that it also uses mean Average Precision (mAP) as the evaluation measure, but is more challenging since it comprises longer videos where objects undergo significant occlusion.…”
Section: Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…For video object identification, we require video object sequences where objects are associated across multiple frames. Hence, to train and evaluate our proposed approach, we used four video instance segmentation datasets: YouTube Video Instance Segmentation (YT-VIS) [51], Unidentified Video Objects (UVO) [47], Occluded Video Instance Segmentation (OVIS) [34], and Tracking Any Object with Video Object Segmentation (TAO-VOS) [8,43]. All these datasets contain a large object vocabulary and various challenging scenarios, including perceptually-aliased occluded objects, as described below:…”
Section: Datasetsmentioning
confidence: 99%