Bowen Cheng scite author profile

In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025 × 2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several topdown approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

show abstract

Masked-attention Mask Transformer for Universal Image Segmentation

Cheng

et al. 2022

View full text Add to dashboard Cite

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

et al. 2020

View full text Add to dashboard Cite

TS$$^{2}$$C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

et al. 2018

View full text Add to dashboard Cite

This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS 2 C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS 2 C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such lowquality distracting candidates and boost the high-quality ones. Specifically, TS 2 C is developed based on two key properties of desirable bounding boxes: 1) high purity, meaning most pixels in the box are with high object response, and 2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS 2 C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.

show abstract

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

et al. 2018

View full text Add to dashboard Cite

Recent region-based object detectors are usually built with separate classification and localization branches on top of shared feature extraction networks. In this paper, we analyze failure cases of state-ofthe-art detectors and observe that most hard false positives result from classification instead of localization. We conjecture that: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects. We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. DCR samples hard false positives from the base classifier in Faster RCNN and trains a RCNN-styled strong classifier. Experiments show new stateof-the-art results on PASCAL VOC and COCO without any bells and whistles.

show abstract

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

Chen

Lopes

Cheng³

et al. 2020

129

101

View full text Add to dashboard Cite

Masked-attention Mask Transformer for Universal Image Segmentation

Cheng¹,

Misra²,

Schwing³

et al. 2021

Preprint

View full text Add to dashboard Cite

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

show abstract

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bowen Cheng

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Masked-attention Mask Transformer for Universal Image Segmentation

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

TS$$^{2}$$C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

Masked-attention Mask Transformer for Universal Image Segmentation

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Contact Info

Product

Resources

About