2022
DOI: 10.48550/arxiv.2210.16870
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A simple, efficient and scalable contrastive masked autoencoder for learning visual representations

Abstract: We introduce CAN, a simple, efficient and scalable method for self-supervised learning of visual representations. Our framework is a minimal and conceptually clean synthesis of (C) contrastive learning, (A) masked autoencoders, and (N) the noise prediction approach used in diffusion models. The learning mechanisms are complementary to one another: contrastive learning shapes the embedding space across a batch of image samples; masked autoencoders focus on reconstruction of the low-frequency spatial correlation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…CAN (Mishra et al, 2022) applied mask on both branches in siamese network and optimized an InfoNCE loss (Oord et al, 2018), a reconstruction loss, and a denoising loss. CMAE (Huang et al, 2022) computed a reconstruction loss and a contrastive loss based on the decoder's outputs between the online branch and the target branch.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…CAN (Mishra et al, 2022) applied mask on both branches in siamese network and optimized an InfoNCE loss (Oord et al, 2018), a reconstruction loss, and a denoising loss. CMAE (Huang et al, 2022) computed a reconstruction loss and a contrastive loss based on the decoder's outputs between the online branch and the target branch.…”
Section: Related Workmentioning
confidence: 99%
“…It presents competitive results against different benchmarks in computer vision. We hope our method provides an insight on transferring and adapting the knowledge from large-scale pre-trained models in a computationally efficient way.Recent studies (Chung et al, 2021) (Mishra et al, 2022 attempt to combine the power of contrastive learning and masked modelling, yielding promising results. They suggest that both paradigms are complementary with each other and can deliver stronger representations when they are combined into a unified framework.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, MAE methods treat each sample independently in the loss function, whereas Contrastive methods explicitly consider the relationships between all samples in a batch by adjusting embedding distances. Given these differences, we posit that these two approaches are complementary [42], which leads to extracting different discriminative features from a given input. On the other hand, the generalization capability of Contrastive learning is influenced by four crucial factors [13].…”
Section: Introductionmentioning
confidence: 99%