AlignSeg: Feature-Aligned Segmentation Networks

Huang, Zilong; Wei, Yunchao; Wang, Xinggang; Liu, Wenyu; Huang, Thomas S.

doi:10.48550/arxiv.2003.00872

Cited by 7 publications

(9 citation statements)

References 51 publications

(115 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unfortunately, the segmentation accuracy of these methods on street scene images, which usually contain many small objects (such as poles and traffic lights), is still far from being satisfactory. This is partly due to the fact that these methods ignore the misalignment between different levels of feature maps, which may lead to the misclassification of boundaries for small objects [23].…”

Section: Acknowledgmentmentioning

confidence: 99%

“…For example, Guided Upsampling Network [24] adopts a guided upsampling module to enrich upsampling operators by learning a transformation based on high-resolution inputs. Huang et al [23] propose the Feature-Aligned Segmentation Networks (AlignNet), which mainly consist of an Aligned Feature Aggregation module (AlignFA) and an Aligned Context Modeling module (AlignCM), to deal with the misalignment problem. Similarly, Semantic Flow Network (SFNet) [25] develops the Flow Alignment Module (FAM) to align and aggregate different levels of feature maps.…”

Section: Feature Aggregationmentioning

confidence: 99%

See 1 more Smart Citation

Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes

Weng

Yan

Chen

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Real-time performance is a very important trait of semantic segmentation models aiming at applications in robotics and intelligent transportation systems. Most previous work in the field involves custom convolutional encoders trained from scratch, and decoders without lateral skip-connections. However, we argue that a better speedaccuracy trade-off is achieved with i) compact encoders designed for competitive ImageNet performance and ii) lightweight decoders with lateral skip-connections. Additionally, we propose a novel interleaved pyramidal fusion scheme which is able to further improve the results on large objects close to the camera. We provide a detailed analysis of prediction accuracy and processing time on Cityscapes and CamVid datasets for models based on ResNet-18 and MobileNetv2. Our Cityscapes test submis-ages at 39.9 Hz on a GTX1080Ti. To the best of our knowledge, this result outperforms all previous approaches aiming at real-time application. The source code is available at https://github.com/orsic/swiftnet.

show abstract

Section: Acknowledgmentmentioning

confidence: 99%

Section: Feature Aggregationmentioning

confidence: 99%

Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes

Weng

Yan

Chen

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Current state-of-the-art semantic segmentation approaches based on the fully convolutional network (FCN) [23] have made remarkable progress in several ways, e.g. by modeling context information [50,7,49,43,19], recovering the spatial details [8,36,20] or designing stronger networks [46,42,34]. The vast majority of semantic segmentation methods consider a static setting, i.e., the training data for all classes are available before training.…”

Section: Related Workmentioning

confidence: 99%

“…Semantic segmentation, aims at assigning semantic class labels to each pixel in a given image, provides significant impacts on various real-world applications, such as autonomous driving [14], augmented reality [1], etc. Specifically, current state-of-the-art semantic segmentation approaches based on the fully convolutional network (FCN) [23] have made remarkable progress in several ways, e.g., by modeling context information [50,7,49,43,19], recovering the spatial details [8,36,20] or designing stronger * equal contribution networks [46,42,34].…”

Section: Introductionmentioning

confidence: 99%

Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation

Huang¹,

Hao²,

Wang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Despite their success for semantic segmentation, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original segmentation model as new classes are available but the initial training data is not retained. Actually, they are vulnerable to catastrophic forgetting problem. We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise. To avoid setting detailed pixel-wise segmentation maps as the supervision manually, we propose the SegInversion to synthesize images using the image-level labels. To increase the diversity of synthetic images, the Scale-Aware Aggregation module is integrated into SegInversion for controlling the scale (the number of pixels) of synthetic objects. Along with real images of new classes, the synthesized images will be fed into the distillation-based framework to train the new segmentation model which retains the information about previously learned classes, whilst updating the current model to learn the new ones. The proposed method significantly outperforms other incremental learning methods, and obtains the state-of-the-art performance on the PASCAL VOC 2012 and ADE20K datasets.

show abstract

“…Recently, deep learning has achieved great success in many computer vision tasks, such as in image classification [10,21], object detection [30,43], semantic segmentation [5,19] and deep learning applications in medicine [42,41,40] and agriculture [7,6] etc. Following this great success, the adoption of deep learning in image matting has also been widely explored in the past few years.…”

Section: Image Mattingmentioning

confidence: 99%

High-Resolution Deep Image Matting

Yu¹,

Xu²,

Huang³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Image matting is a key technique for image and video editing and composition. Conventionally, deep learning approaches take the whole input image and an associated trimap to infer the alpha matte using convolutional neural networks. Such approaches set state-of-the-arts in image matting; however, they may fail in real-world matting applications due to hardware limitations, since real-world input images for matting are mostly of very high resolution. In this paper, we propose HDMatt, a first deep learning based image matting approach for high-resolution inputs. More concretely, HDMatt runs matting in a patch-based crop-and-stitch manner for high-resolution inputs with a novel module design to address the contextual dependency and consistency issues between different patches. Compared with vanilla patch-based inference which computes each patch independently, we explicitly model the crosspatch contextual dependency with a newly-proposed Cross-Patch Contextual module (CPC) guided by the given trimap. Extensive experiments demonstrate the effectiveness of the proposed method and its necessity for high-resolution inputs. Our HDMatt approach also sets new state-of-the-art performance on Adobe Image Matting and AlphaMatting benchmarks and produce impressive visual results on more real-world high-resolution images.

show abstract

AlignSeg: Feature-Aligned Segmentation Networks

Cited by 7 publications

References 51 publications

Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes

Stage-Aware Feature Alignment Network for Real-Time Semantic Segmentation of Street Scenes

Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation

High-Resolution Deep Image Matting

Contact Info

Product

Resources

About