2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00691
|View full text |Cite
|
Sign up to set email alerts
|

Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation

Abstract: The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixellevel. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language (VL) relation modeling from scratch. Witnessing the success of Vision-Language Pretrained (VLP) models, we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(19 citation statements)
references
References 87 publications
0
17
0
Order By: Relevance
“…The variants focus more on the VOS efficiency. Without online fine-tuning, the discussed variants (OSNM (Yang et al 2018), A-GAME (Johnander et al 2019), FRTM (Robinson et al 2020), LWL (Bhat et al 2020), and TAODA (Zhou et al 2021)) have developed to shift the network output domain with more efficient algorithms. Although achieving better efficiency, the accuracy gaps remain between the earlier variants (OSNM and A-GAME) and the extension works.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The variants focus more on the VOS efficiency. Without online fine-tuning, the discussed variants (OSNM (Yang et al 2018), A-GAME (Johnander et al 2019), FRTM (Robinson et al 2020), LWL (Bhat et al 2020), and TAODA (Zhou et al 2021)) have developed to shift the network output domain with more efficient algorithms. Although achieving better efficiency, the accuracy gaps remain between the earlier variants (OSNM and A-GAME) and the extension works.…”
Section: Discussionmentioning
confidence: 99%
“…TAODA (Target-Aware Object Discovery and Association for UVOS, Zhou et al 2021) implements a similar target model to FRTM to generate coarse object masks. Differently, the target model is initialised with the instances predicted in the first frame due to no annotations available in UVOS.…”
Section: Variantsmentioning
confidence: 99%
“…To cope with these situations and guarantee that region information is passed to the subsequent modules, we have developed computer vision algorithms that operate at the pixel level and are use-case agnostic. The works proposed in [ 29 , 30 ] describe interesting approaches for the accurate and efficient segmentation of objects in video, taking advantage of motion and temporal information. However, in the current proposal, we deal with single still-shot images, which does not allow the applicability of these proposals.…”
Section: Semantic Information Extractionmentioning
confidence: 99%
“…Video Segmentation. A comprehensive overview [11] of multiple tasks in the field of video segmentation has recently been proposed, which broadly classifies video segmentation into eight tasks such as video object segmentation [6,[26][27][28], video instance segmentation [3,18,23,24,29,30], and video panoptic segmentation [1,5,12], etc. Among them, it systematically describes the methods used in these tasks in recent years, the datasets used and the results achieved so far, as well as the future trends.…”
Section: Related Workmentioning
confidence: 99%