Context-aware Feature Generation For Zero-shot Semantic Segmentation

Gu, Zhidong; Zhou, Sanming; Niu, Li; Zhao, Zihan; Zhang, Liqing

doi:10.1145/3394171.3413593

Cited by 80 publications

(86 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text embeddings of class labels play a central role in these works. Bucher et al (2019) and Gu et al (2020) propose to leverage word embeddings together with a generative model to generate visual features of unseen categories, while Xian et al (2019) propose to project visual features into a simple word embedding space and to correlate the resulting embeddings to assign a label to a pixel. propose to use uncertainty-aware learning to better handle noisy labels of seen classes, while introduce a structured learning approach to better exploit the relations between seen and unseen categories.…”

Section: Related Workmentioning

confidence: 99%

“…However, these approaches still require labeled data that includes the novel classes in order to facilitate transfer. Zero-shot methods, on the other hand, commonly leverage word embeddings to discover or generate related features between seen and unseen classes (Bucher et al, 2019;Gu et al, 2020) without the need for additional annotations. Existing works in this space use standard word embeddings (Mikolov et al, 2013) and focus on the image encoder.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Language-driven Semantic Segmentation

Li¹,

Weinberger²,

Belongie³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., "cat" and "furry"). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero-and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided. Code and demo are available at https://github.com/isl-org/lang-seg.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Language-driven Semantic Segmentation

Li¹,

Weinberger²,

Belongie³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, multi-scale feature fusion has achieved remarkable success in many computer vision fields like object detection [22], salient object detection [7,4], instance segmentation [23]. However, previous methods mainly fused multiscale features only in the encoder [22,24] or only in the decoder [7,25]. Besides, most of them did not consider the induced redundant information when integrating multi-scale features.…”

Section: Multi-scale Feature Fusionmentioning

confidence: 99%

Inharmonious Region Localization

Liang

Niu

Zhang

2021

2021 IEEE International Conference on Multimedia and Expo (ICME)

Self Cite

View full text Add to dashboard Cite

The advance of image editing techniques allows users to create artistic works, but the manipulated regions may be incompatible with the background. Localizing the inharmonious region is an appealing yet challenging task. Realizing that this task requires effective aggregation of multi-scale contextual information and suppression of redundant information, we design novel Bi-directional Feature Integration (BFI) block and Global-context Guided Decoder (GGD) block to fuse multi-scale features in the encoder and decoder respectively. We also employ Mask-guided Dual Attention (MDA) block between the encoder and decoder to suppress the redundant information. Experiments on the image harmonization dataset demonstrate that our method achieves competitive performance for inharmonious region localization. The source code is available at https://github.com/bcmi/DIRL.

show abstract

“…in [12], without even knowing the number of classes a-priori. Related to this, the intention of zero-shot segmentation is to segment non-annotated objects that have not previously been seen by a neural network, as in [23,24], which, however, do not utilize with scribbles.…”

Section: Related Workmentioning

confidence: 99%

Learning or Modelling? An Analysis of Single Image Segmentation Based on Scribble Information

Dröge

Moeller

2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Single image segmentation based on scribbles is an important technique in several applications, e.g. for image editing software. In this paper, we investigate the scope of single image segmentation solely given the image and scribble information using both convolutional neural networks as well as classical model-based methods, and present three main findings: 1) Despite the success of deep learning in the semantic analysis of images, networks fail to outperform model-based approaches in the case of learning on a single image only. Even using a pretrained network for transfer learning does not yield faithful segmentations. 2) The best way to utilize an annotated data set is by exploiting a model-based approach that combines semantic features of a pretrained network with the RGB information, and 3) allowing the networks prediction to change spatially and additionally enforce this variation to be smooth via a gradient-based regularization term on the loss (double backpropagation) is the most successful strategy for pure single image learning-based segmentation.

show abstract

Context-aware Feature Generation For Zero-shot Semantic Segmentation

Cited by 80 publications

References 42 publications

Language-driven Semantic Segmentation

Language-driven Semantic Segmentation

Inharmonious Region Localization

Learning or Modelling? An Analysis of Single Image Segmentation Based on Scribble Information

Contact Info

Product

Resources

About