Xiangtai Li scite author profile

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features. Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

show abstract

Involution: Inverting the Inherence of Convolution for Visual Recognition

Li¹,

Hu²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatialagnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including Im-ageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involutionbased models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at https://github.com/d-li14/involution.

show abstract

End-to-End Video Object Detection with Spatial-Temporal Transformers

Lű

Zhou

et al. 2021

View full text Add to dashboard Cite

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

et al. 2021

View full text Add to dashboard Cite

Pyrolyzing cobalt diethylenetriamine chelate on carbon (CoDETA/C) as a family of non-precious metal oxygen reduction catalyst

Zhang

et al. 2014

International Journal of Hydrogen Energy

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiangtai Li

Semantic Flow for Fast and Accurate Scene Parsing

Involution: Inverting the Inherence of Convolution for Visual Recognition

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

Gated Fully Fusion for Semantic Segmentation

Involution: Inverting the Inherence of Convolution for Visual Recognition

End-to-End Video Object Detection with Spatial-Temporal Transformers

PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation

Pyrolyzing cobalt diethylenetriamine chelate on carbon (CoDETA/C) as a family of non-precious metal oxygen reduction catalyst

Contact Info

Product

Resources

About