Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the "detect-then-segment" strategy (e.g., Mask R-CNN), or predict embedding vectors first then cluster pixels into individual instances. In this paper, we view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location. With this notion, we propose segmenting objects by locations (SOLO), a simple, direct, and fast framework for instance segmentation with strong performance. We derive a few SOLO variants (e.g., Vanilla SOLO, Decoupled SOLO, Dynamic SOLO) following the basic principle. Our method directly maps a raw input image to the desired object categories and instance masks, eliminating the need for the grouping post-processing or the bounding box detection. Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy, while being considerably simpler than the existing methods. Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation. We further demonstrate the flexibility and high-quality segmentation of SOLO by extending it to perform one-stage instance-level image matting.
Semantic segmentation suffers from the fact that densely annotated masks are expensive to obtain. To tackle this problem, we aim at learning to segment by only leveraging scribbles that are much easier to collect for supervision. To fully explore the limited pixel-level annotations from scribbles, we present a novel Boundary Perception Guidance (BPG) approach, which consists of two basic components, i.e., prediction refinement and boundary regression. Specifically, the prediction refinement progressively makes a better segmentation by adopting an iterative upsampling and a semantic feature enhancement strategy. In the boundary regression, we employ class-agnostic edge maps for supervision to effectively guide the segmentation network in localizing the boundaries between different semantic regions, leading to producing finer-grained representation of feature maps for semantic segmentation. The experiment results on the PASCAL VOC 2012 demonstrate the proposed BPG achieves mIoU of 73.2% without fully connected Conditional Random Field (CRF) and 76.0% with CRF, setting up the new state-of-the-art in literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.