Fully Convolutional Instance-Aware Semantic Segmentation

Li, Yang; Qi, Haozhi; Dai, Jifeng; Ji, Xiangyang; Wei, Yichen

doi:10.1109/cvpr.2017.472

Cited by 991 publications

(720 citation statements)

References 36 publications

Supporting

Mentioning

715

Contrasting

Unclassified

Order By: Relevance

“…One-stage instance segmentation methods generate position sensitive maps that are assembled into final masks with positionsensitive pooling [3], [16] or combine semantic segmentation logits and direction prediction logits [17]. Though conceptually faster than two-stage methods, they still require repooling or other non-trivial computations (e.g., mask voting).…”

Section: Related Workmentioning

confidence: 99%

“…Our approach might seem surprising, as the general consensus around instance segmentation is that because FCNs are translation invariant, the task needs translation variance added back in [3]. Thus methods like FCIS [3] and Mask R-CNN [2] try to explicitly add translation variance, whether it be by directional maps and position-sensitive repooling, or by putting the mask branch in the second stage so it does not have to deal with localizing instances. In our method, the only translation variance we add is to crop the final mask with the predicted bounding box.…”

Section: Emergent Behaviormentioning

confidence: 99%

See 1 more Smart Citation

YOLACT: Real-Time Instance Segmentation

Bolya

Zhou

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,534

881

View full text Add to dashboard Cite

We present a simple, fully-convolutional model for real-time (> 30 fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn't depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free. Furthermore, we analyze the emergent behavior of our prototypes and show they learn to localize instances on their own in a translation variant manner, despite being fully-convolutional. We also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty. Finally, by incorporating deformable convolutions into the backbone network, optimizing the prediction head with better anchor scales and aspect ratios, and adding a novel fast mask re-scoring branch, our YOLACT++ model can achieve 34.1 mAP on MS COCO at 33.5 fps, which is fairly close to the state-of-the-art approaches while still running at real-time.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Emergent Behaviormentioning

confidence: 99%

YOLACT: Real-Time Instance Segmentation

Bolya

Zhou

Xiao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,534

881

View full text Add to dashboard Cite

show abstract

“…Many factors contribute to this; among them, large datasets play crucial roles. Visual datasets with labels are used to train and evaluate machine learning models and lead to success in computer vision with novel architectures, such as AlexNet [1], Faster-RCNN [2], and FCIS [3].…”

Section: Introduction Creating Machines That Can Solve Complex Promentioning

confidence: 99%

Comparison of Visual Datasets for Machine Learning

Gauen

Dailey

Laiman

et al. 2017

2017 IEEE International Conference on Information Reuse and Integration (IRI)

View full text Add to dashboard Cite

Abstract-One of the greatest technological improvements in recent years is the rapid progress using machine learning for processing visual data. Among all factors that contribute to this development, datasets with labels play crucial roles. Several datasets are widely reused for investigating and analyzing different solutions in machine learning. Many systems, such as autonomous vehicles, rely on components using machine learning for recognizing objects. This paper compares different visual datasets and frameworks for machine learning. The comparison is both qualitative and quantitative and investigates object detection labels with respect to size, location, and contextual information. This paper also presents a new approach creating datasets using real-time, geo-tagged visual data, greatly improving the contextual information of the data. The data could be automatically labeled by cross-referencing information from other sources (such as weather).

show abstract

“…Instead of labeling all pixels, it focuses on the target objects and labels only pixels of those objects. FCIS [25] is a technique developed based on fully convolutional networks (FCN). Mask R-CNN [26] is also created on top of FCN but incorporates with a proposed joint formulation.…”

Section: Deep Learning For Semantic Segmentationmentioning

confidence: 99%

Road Segmentation of Remotely-Sensed Images Using Deep Convolutional Neural Networks with Landscape Metrics and Conditional Random Fields

et al. 2017

View full text Add to dashboard Cite

Object segmentation of remotely-sensed aerial (or very-high resolution, VHS) images and satellite (or high-resolution, HR) images, has been applied to many application domains, especially in road extraction in which the segmented objects are served as a mandatory layer in geospatial databases. Several attempts at applying the deep convolutional neural network (DCNN) to extract roads from remote sensing images have been made; however, the accuracy is still limited. In this paper, we present an enhanced DCNN framework specifically tailored for road extraction of remote sensing images by applying landscape metrics (LMs) and conditional random fields (CRFs). To improve the DCNN, a modern activation function called the exponential linear unit (ELU), is employed in our network, resulting in a higher number of, and yet more accurate, extracted roads. To further reduce falsely classified road objects, a solution based on an adoption of LMs is proposed. Finally, to sharpen the extracted roads, a CRF method is added to our framework. The experiments were conducted on Massachusetts road aerial imagery as well as the Thailand Earth Observation System (THEOS) satellite imagery data sets. The results showed that our proposed framework outperformed Segnet, a state-of-the-art object segmentation technique, on any kinds of remote sensing imagery, in most of the cases in terms of precision, recall, and F1.

show abstract

Fully Convolutional Instance-Aware Semantic Segmentation

Cited by 991 publications

References 36 publications

YOLACT: Real-Time Instance Segmentation

YOLACT: Real-Time Instance Segmentation

Comparison of Visual Datasets for Machine Learning

Road Segmentation of Remotely-Sensed Images Using Deep Convolutional Neural Networks with Landscape Metrics and Conditional Random Fields

Contact Info

Product

Resources

About