Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation

Zhou, Tianfei; Li, Jianwu; Li, Xueyi; Shao, Ling

doi:10.1109/cvpr46437.2021.00691

Cited by 42 publications

(19 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The variants focus more on the VOS efficiency. Without online fine-tuning, the discussed variants (OSNM (Yang et al 2018), A-GAME (Johnander et al 2019), FRTM (Robinson et al 2020), LWL (Bhat et al 2020), and TAODA (Zhou et al 2021)) have developed to shift the network output domain with more efficient algorithms. Although achieving better efficiency, the accuracy gaps remain between the earlier variants (OSNM and A-GAME) and the extension works.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Deep learning for video object segmentation: a review

Gao

Zheng

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review.

show abstract

Section: Discussionmentioning

confidence: 99%

“…TAODA (Target-Aware Object Discovery and Association for UVOS, Zhou et al 2021) implements a similar target model to FRTM to generate coarse object masks. Differently, the target model is initialised with the instances predicted in the first frame due to no annotations available in UVOS.…”

Section: Variantsmentioning

confidence: 99%

Deep learning for video object segmentation: a review

Gao

Zheng

et al. 2022

Artif Intell Rev

View full text Add to dashboard Cite

show abstract

“…To cope with these situations and guarantee that region information is passed to the subsequent modules, we have developed computer vision algorithms that operate at the pixel level and are use-case agnostic. The works proposed in [ 29 , 30 ] describe interesting approaches for the accurate and efficient segmentation of objects in video, taking advantage of motion and temporal information. However, in the current proposal, we deal with single still-shot images, which does not allow the applicability of these proposals.…”

Section: Semantic Information Extractionmentioning

confidence: 99%

Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

et al. 2022

View full text Add to dashboard Cite

Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content- and context-aware video.

show abstract

“…Video Segmentation. A comprehensive overview [11] of multiple tasks in the field of video segmentation has recently been proposed, which broadly classifies video segmentation into eight tasks such as video object segmentation [6,[26][27][28], video instance segmentation [3,18,23,24,29,30], and video panoptic segmentation [1,5,12], etc. Among them, it systematically describes the methods used in these tasks in recent years, the datasets used and the results achieved so far, as well as the future trends.…”

Section: Related Workmentioning

confidence: 99%

Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation

Ye¹,

Lan²,

Ge³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video Panoptic Segmentation (VPS) requires generating consistent panoptic segmentation and tracking identities to all pixels across video frames. Existing methods are mainly based on the trained instance embedding to maintain consistent panoptic segmentation. However, they inevitably struggle to cope with the challenges of small objects, similar appearance but inconsistent identities, occlusion, and strong instance contour deformations. To address these problems, we present HybridTracker, a lightweight and joint tracking model attempting to eliminate the limitations of the single tracker. HybridTracker performs pixel tracker and instance tracker in parallel to obtain the association matrices, which are fused into a matching matrix. In the instance tracker, we design a differentiable matching layer, ensuring the stability of inter-frame matching. In the pixel tracker, we compute the dice coefficient of the same instance of different frames given the estimated optical flow, forming the Intersection Over Union (IoU) matrix. We additionally propose mutual check and temporal consistency constraints during inference to settle the occlusion and contour deformation challenges. Extensive experiments demonstrate that HybridTracker outperforms state-of-the-art methods on Cityscapes-VPS and VIPER datasets.

show abstract

Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation

Cited by 42 publications

References 87 publications

Deep learning for video object segmentation: a review

Deep learning for video object segmentation: a review

Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation

Contact Info

Product

Resources

About