2021
DOI: 10.48550/arxiv.2112.03185
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples

Abstract: Semantic segmentation is a key computer vision task that has been actively researched for decades. In recent years, supervised methods have reached unprecedented accuracy, however they require many pixel-level annotations for every new class category which is very time-consuming and expensive. Additionally, the ability of current semantic segmentation networks to handle a large number of categories is limited. That means that images containing rare class categories are unlikely to be well segmented by current … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…The concurrently developed unpublished text-supervised semantic segmentation methods [29,86,90,96] also show promising results. One major difference between these methods and GroupViT is that, they exploit vision-language model [32,61] pre-trained on well prepared large-scale 400M-1.8B image-text data, while our GroupViT is trained from scratch with much noisier data (30M images) to learn grouping and segmentation and yet achieves competitive performance.…”
Section: Related Workmentioning
confidence: 95%
“…The concurrently developed unpublished text-supervised semantic segmentation methods [29,86,90,96] also show promising results. One major difference between these methods and GroupViT is that, they exploit vision-language model [32,61] pre-trained on well prepared large-scale 400M-1.8B image-text data, while our GroupViT is trained from scratch with much noisier data (30M images) to learn grouping and segmentation and yet achieves competitive performance.…”
Section: Related Workmentioning
confidence: 95%
“…Propose language-driven semantic segmentation by matching pixel and text embeddings. SSIW [187] Introduce a test-time augmentation technique to refine the pseudo labels generated by CLIP. MaskClip+ [180] [code]…”
Section: Task Methods Contributionmentioning
confidence: 99%
“…In addition, ZegCLIP [188] employs CLIP to generate semantic masks and introduces a relationship descriptor to mitigate overfitting on base classes in knowledge distillation. MaskCLIP+ [180] and SSIW [187] distill knowledge with VLM-predicted pixel-level pseudo labels, where SSIW improves the pseudo labels by test-time augmentation. Knowledge distillation for weakly-supervised semantic segmentation aims to leverage both VLMs and weak supervision (e.g., image-level labels) for semantic segmentation.…”
Section: Knowledge Distillation For Semantic Segmentationmentioning
confidence: 99%