2021
DOI: 10.48550/arxiv.2102.06191
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Abstract: Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. In this paper, we make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case. To achieve this, we introduce a novel two-step framew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(29 citation statements)
references
References 68 publications
(117 reference statements)
0
29
0
Order By: Relevance
“…Image segmentation was conducted to simplify the features of the cirrus clouds. A 128 2 x 3 (16,384 x 3) dimensional array, consisting of each pixel value of an image along with the pixel's coordinates on the image (ranging from 0 to 128), was fed into the K-Means Clustering algorithm with a K-value of 3.…”
Section: K-means Clustering For Unsupervised Cirrus Image Segmentationmentioning
confidence: 99%
“…Image segmentation was conducted to simplify the features of the cirrus clouds. A 128 2 x 3 (16,384 x 3) dimensional array, consisting of each pixel value of an image along with the pixel's coordinates on the image (ranging from 0 to 128), was fed into the K-Means Clustering algorithm with a K-value of 3.…”
Section: K-means Clustering For Unsupervised Cirrus Image Segmentationmentioning
confidence: 99%
“…Early work by [10] enforce patch-level local features across different augmented views of an image to be similar to each other while being dissimilar to representations of other local regions within the image. More recent local contrastive learning methods extend this work by using surrogate semantic labels [17], [18] where the labels are estimated on unlabeled images using supervised techniques such as saliency maps estimation approaches [19], [20], [21], super-pixels [22], and image computable masks [23], [24].. These methods encourage local regions having the same semantic label to have similar representations while being dissimilar to regions with different labels.…”
Section: A Motivationmentioning
confidence: 99%
“…Some methods [10], [44], [45], [46] pre-train by matching patch/pixel-level representations across either across two augmented views of an image or positive pair of images defined using domain cues. Other methods [18], [17] obtain some surrogate masks of foreground objects using unsupervised techniques such as saliency maps [19], [20], [21], super-pixels [22], image computable masks [23], [24]. Later, the local level features of a chosen fore-ground object are optimized to be similar while other objects local features are optimized to be dissimilar.…”
Section: Related Workmentioning
confidence: 99%
“…Segsort [37] predicts segmentation by learning to group super-pixels of similar appearance and context from static images. Later work [38] contrasts holistic mask proposals obtained from traditional bottom-up grouping.…”
Section: Related Workmentioning
confidence: 99%