2022
DOI: 10.1007/s11263-022-01629-1
|View full text |Cite
|
Sign up to set email alerts
|

Occluded Video Instance Segmentation: A Benchmark

Abstract: Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and associa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
53
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 65 publications
(54 citation statements)
references
References 91 publications
1
53
0
Order By: Relevance
“…On the other hand, video data in the real world have additional temporal information compared to static image data, and the data form in the testing stage is video, so it is more straightforward to bring more video segmentation datasets to improve performance. Benefit from the release of several new datasets in the video segmentation field recently, such as YoutubeVIS which has more objects in each video, OVIS [18] which occlusion scenarios are significant, and VSPW [12]) which have dense annotations and high-quality resolution, we introduce them into the second training stage, thus significantly improving the performance of models.…”
Section: Data Mattersmentioning
confidence: 99%
See 2 more Smart Citations
“…On the other hand, video data in the real world have additional temporal information compared to static image data, and the data form in the testing stage is video, so it is more straightforward to bring more video segmentation datasets to improve performance. Benefit from the release of several new datasets in the video segmentation field recently, such as YoutubeVIS which has more objects in each video, OVIS [18] which occlusion scenarios are significant, and VSPW [12]) which have dense annotations and high-quality resolution, we introduce them into the second training stage, thus significantly improving the performance of models.…”
Section: Data Mattersmentioning
confidence: 99%
“…In the pre-training stage, several static image datasets including COCO [9], ECSSD [19], MSRA10K [4], PASCAL-S [7], PASCAL-VOC [6] are used for preliminarily semantic learning. During the main training, video datasets including Youtube-VOS [23], DAVIS 2017 [17], YouTubeVIS [8], OVIS [18], and VSPW [12] are used to enhance the generalization and robustness of the model.…”
Section: Training Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…Amodal perception is of interest in many application fields of computer vision. Hence, datasets for amodal perception can be found in different fields, e.g., instance [17] and video instance segmentation [19], [20], human recognition and deocclusion [21]. The OVIS dataset [19] provides instance masks for videos while additionally labeling the occlusion level of each instance.…”
Section: A Datasets For Amodal Perceptionmentioning
confidence: 99%
“…Hence, datasets for amodal perception can be found in different fields, e.g., instance [17] and video instance segmentation [19], [20], human recognition and deocclusion [21]. The OVIS dataset [19] provides instance masks for videos while additionally labeling the occlusion level of each instance. SAIL-VOS [20] is a synthetic video instance segmentation dataset with amodal instance segmentation masks.…”
Section: A Datasets For Amodal Perceptionmentioning
confidence: 99%