Semantic segmentation, aiming to assign semantic labels to each pixel, is broadly applied into many fields, such as video surveillance, medical image analysis, and autonomous driving. However, there are two challenges in semantic segmentation task: 1) the deficiency of rich contextual information; and 2) the lack of sufficient spatial information, all of which affect segmentation performance seriously. To solve these two challenges, the global feature capturing module (GFCM) and Conv Block are proposed in this paper to build a new model to improve segmentation performance. Specifically, GFCM, made of the global encoding module (GEM) and spatial attention module (SAM), is designed to extract adequate global contextual information and build global spatial dependencies. Composed of three convolution layers, Conv Block is proposed to preserve rich spatial information. Based on GFCM and Conv Block, a new model is designed, where a data-dependent upsampling operator (DUpsampling) is exploited to recover the pixel-wise prediction effectively. The extensive experiments have been made to prove the effectiveness of the design, and the new model achieves 73.69% mIoU on Cityscapes test set and 80.05% mIoU on PASCAL VOC 2012 test set without any post-processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.