One-Shot Video Object Segmentation

Caelles, Sergi; Maninis, Kevis-Kokitsi; Pont-Tuset, Jordi; Leal-Taixé, Laura; Cremers, Daniel; Gool, Luc Van

doi:10.48550/arxiv.1611.05198

Cited by 7 publications

(11 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So we use 5 video set AC, IU, JM, MS, VK in i2iDatabase dataset (results shown in Figure 2) and three measures used in DAVIS database to evaluate our system in the following terms: region similarity J (with respect to intersection of union -IoU), contour accuracy F and temporal stability T . Although our method focus on the case with no manual input, we compare our result with the state-of-the-art methods in both unsupervised (FST [28]) and semi-supervised techniques (BVS [31] and OSVOS [29]), the latter of which takes ground-truth of first frame as initial mask.…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…For monocular video, algorithms are normally difficult to define the region of foreground by only color and motion information without human interaction. Therefore, most researches [4,5,31,29] have adapted the approach of providing manually the mask of key frames to facilitate segmentation. Based on this approach, video segmentation systems [1,2,3,6] which require gradually adding the user's input to correct the result during segmentation processing are built, and they could achieve considerable segmenting accuracy under human interaction.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Streaming Segmentation of Stereo Video Using Bilateral Space

Ke¹,

Zhu²,

Yu³

2017

Preprint

View full text Add to dashboard Cite

In the field of video segmentation, the majority methods are based on monocular video. Traditional unsupervised segmentation algorithms do not perform well in terms of time efficiency and accuracy, because of the bottleneck on the foreground definition. Semi-supervised segmentation algorithms aim to propagate the label information in one or more key frames, which are generated manually and used as masks in the processing, to the whole video. They can achieve high accuracy, while they are not suitable for the application scenario without human interaction. In this paper, we take advantage of binocular camera and propose an unsupervised algorithm to efficiently extract foreground part from stereo video. The depth information is embedded into a bilateral grid in the graph cut model which achieves considerable segmenting accuracy without human interaction. Streaming processing model is integrated to enable on-line processing for stereo video with arbitrary length. The precision, time efficiency, and adaptation to complex natural scenario of our algorithm are evaluated by experiments comparing with state-of-the-art algorithms in both unsupervised and semi-supervised approaches.

show abstract

Section: Experiments and Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Automatic Streaming Segmentation of Stereo Video Using Bilateral Space

Ke¹,

Zhu²,

Yu³

2017

Preprint

View full text Add to dashboard Cite

show abstract

“…Our approach outputs per-frame instance segmentation using a convnet architecture, inspired by works from other domains like [6,40,49]. A concurrent work [5] also exploits convnets for video object segmentation. Differently from our approach their segmentation is not guided, which might result in performance decay over time.…”

Section: Global Propagationmentioning

confidence: 99%

Learning Video Object Segmentation from Static Images

Khoreva¹,

Perazzi²,

Benenson³

et al. 2016

Preprint

View full text Add to dashboard Cite

Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convnet trained with static images only. The key ingredient of our approach is a combination of offline and online learning strategies, where the former serves to produce a refined mask from the previous' frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations: bounding boxes and segments, as well as incorporate multiple annotated frames, making the system suitable for diverse applications. We obtain competitive results on three different datasets, independently from the type of input annotation.

show abstract

“…e-mail: mennatul@ualberta.ca. 2 Mahmoud Gamal is with Cairo University, Egypt. 3 Mohamed El-Hoseiny is with Facebook AI Research.…”

Section: Introductionmentioning

confidence: 99%

“…(1) Abundance of the different poses of the object. (2) The existence of different instances/classes within the same category. (3) Different challenges introduced by cluttered backgrounds, different rigid and non-rigid transformations, occlusions and illumination changes.…”

Section: Introductionmentioning

confidence: 99%

Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting

Siam

Jiang

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Video object segmentation is an essential task in robot manipulation to facilitate grasping and learning affordances. Incremental learning is important for robotics in unstructured environments. Inspired by the children learning process, human robot interaction (HRI) can be utilized to teach robots about the world guided by humans similar to how children learn from a parent or a teacher. A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels. We propose a novel teacher-student learning paradigm to teach robots about their surrounding environment. A two-stream motion and appearance "teacher" network provides pseudo-labels to adapt an appearance "student" network. The student network is able to segment the newly learned objects in other scenes, whether they are static or in motion. We also introduce a carefully designed dataset that serves the proposed HRI setup, denoted as (I)nteractive (V)ideo (O)bject (S)egmentation. Our IVOS dataset contains teaching videos of different objects, and manipulation tasks. Our proposed adaptation method outperforms the state-of-theart on DAVIS and FBMS with 6.8% and 1.2% in F-measure respectively. It improves over the baseline on IVOS dataset with 46.1% and 25.9% in mIoU.

show abstract

One-Shot Video Object Segmentation

Cited by 7 publications

References 0 publications

Automatic Streaming Segmentation of Stereo Video Using Bilateral Space

Automatic Streaming Segmentation of Stereo Video Using Bilateral Space

Learning Video Object Segmentation from Static Images

Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting

Contact Info

Product

Resources

About