2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01414
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 69 publications
(38 citation statements)
references
References 34 publications
0
38
0
Order By: Relevance
“…Unlike the previous attempts which only consider foreground-background partition [36,38,40], or divide each image into a fixed number of clusters [26], we argue that it is important to consider different images distinguishably, due to the complexity of various scenarios. We thus propose the Dynamic Clustering Network (DCN) to cluster the pixels into dynamic semantic groups for each image.…”
Section: Introductionmentioning
confidence: 85%
See 1 more Smart Citation
“…Unlike the previous attempts which only consider foreground-background partition [36,38,40], or divide each image into a fixed number of clusters [26], we argue that it is important to consider different images distinguishably, due to the complexity of various scenarios. We thus propose the Dynamic Clustering Network (DCN) to cluster the pixels into dynamic semantic groups for each image.…”
Section: Introductionmentioning
confidence: 85%
“…For example, Simeoni et al [36] proposed a series of hand-made rules to chose pixels belonging to same object according to their feature similarity, achieving unsupervised object discovery and detection. Wang et al [40] introduce normalized cuts [33] on the affinity graph constructed by pixel embeddings from DINO to divide foreground and background in an image, for unsupervised object discovery and saliency detection task. For semantic segmentation task, Hamilton et al [12] train a segmentation head by distilling the feature correspondences, which further encourages pixel features to form compact clusters and learn better pixel-level representations.…”
Section: Self-supervised Vision Transformers and Applicationsmentioning
confidence: 99%
“…Recently, vision transformers have been adapted to broader domains, including point cloud [63] and video understanding [2,61]. Another line of works focuses on transferring pretrained knowledge of vision transformers in an unsupervised or semi-supervised manner by leveraging self-distillation [8], semantics reallocation [21], seed propagation [36], and normalized cut [47].…”
Section: Related Workmentioning
confidence: 99%
“…First, we compare our PAVER model against competitive baselines for predicting 360 • video saliency based on optical flow [49], gradient flow [46], generative adversarial networks [33], and class activation map with optical flow [11]. We also report the performance of unsupervised saliency detection and unsupervised object discovery models based on vision transformers, including TS-CAM [21], DINO [8], LOST [36] and TokenCut [47]. For a fair comparison with our approach, we use the identical local patch projection module in Sec.…”
Section: Experiments Settingmentioning
confidence: 99%
See 1 more Smart Citation