Sergi Caelles scite author profile

Figure 1. Example result of our technique: The segmentation of the first frame (red) is used to learn the model of the specific object to track, which is segmented in the rest of the frames independently (green). One every 20 frames shown of 90 in total. AbstractThis paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Although all frames are processed independently, the results are temporally coherent and stable. We perform experiments on two annotated video segmentation databases, which show that OSVOS is fast and improves the state of the art by a significant margin (79.8% vs 68.0%).

show abstract

Deep Extreme Cut: From Extreme Points to Object Segmentation

Maninis

et al. 2018

View full text Add to dashboard Cite

Figure 1. Example results of DEXTR: The user provides the extreme clicks for an object, and the CNN produces the segmented masks. AbstractThis paper explores the use of extreme points in an object (left-most, right-most, top, bottom pixels) as input to obtain precise object segmentation for images and videos. We do so by adding an extra channel to the image in the input of a convolutional neural network (CNN), which contains a Gaussian centered in each of the extreme points. The CNN learns to transform this information into a segmentation of an object that matches those extreme points.We demonstrate the usefulness of this approach for guided segmentation (grabcut-style), interactive segmentation, video object segmentation, and dense segmentation annotation. We show that we obtain the most precise results to date, also with less user input, in an extensive and varied selection of benchmarks and datasets. All our models and code are publicly available on

show abstract

Video Object Segmentation without Temporal Information

Maninis

Caelles

Chen

et al. 2019

IEEE Trans. Pattern Anal. Mach. Intell.

296

209

View full text Add to dashboard Cite

Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When temporal smoothness is suddenly broken, such as when an object is occluded, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e. disregarding temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS ), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOS is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOS obtains competitive results.

show abstract

Iterative Deep Retinal Topology Extraction

Ventura

Pont-Tuset

Caelles

et al. 2018

View full text Add to dashboard Cite

This paper tackles the task of estimating the topology of road networks from aerial images. Building on top of a global model that performs a dense semantical classification of the pixels of the image, we design a Convolutional Neural Network (CNN) that predicts the local connectivity among the central pixel of an input patch and its border points. By iterating this local connectivity we sweep the whole image and infer the global topology of the road network, inspired by a human delineating a complex network with the tip of their finger. We perform an extensive and comprehensive qualitative and quantitative evaluation on the road network estimation task, and show that our method also generalizes well when moving to networks of retinal vessels.

show abstract

Deep Extreme Cut: From Extreme Points to Object Segmentation

Maninis¹,

Caelles²,

Pont-Tuset³

et al. 2017

Preprint

View full text Add to dashboard Cite

Video Object Segmentation Without Temporal Information

Maninis¹,

Caelles²,

Chen³

et al. 2017

Preprint

View full text Add to dashboard Cite

One-Shot Video Object Segmentation

Caelles¹,

Maninis²,

Pont-Tuset³

et al. 2016

Preprint

View full text Add to dashboard Cite

First real-time coherent MIMO-DSP for six coupled mode transmission

Randel¹,

Corteselli²,

Badini³

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sergi Caelles

One-Shot Video Object Segmentation

Deep Extreme Cut: From Extreme Points to Object Segmentation

Video Object Segmentation without Temporal Information

Iterative Deep Retinal Topology Extraction

Deep Extreme Cut: From Extreme Points to Object Segmentation

Video Object Segmentation Without Temporal Information

One-Shot Video Object Segmentation

First real-time coherent MIMO-DSP for six coupled mode transmission

Contact Info

Product

Resources

About