2022
DOI: 10.48550/arxiv.2202.03670
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How to Understand Masked Autoencoders

Abstract: Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes the self-supervised learning method in that it not only achieves the state-of-the-art for image pre-training, but is also a milestone that bridges the gap between visual and linguistic masked autoencoding (BERT-style) pre-trainings. However, to our knowledge, to date there are no theoretical perspectives to explain the powerful expressivity of MAE. In this paper, we, for the first time, propose a unified theoretical framework that provides … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 33 publications
(55 reference statements)
1
8
0
Order By: Relevance
“…4 (PACS). In this setting, our DiMAE achieves a better performance than previous works on most tasks and gets significant gains over DIUL and other SSL methods on overall and average accuracy 3 . Compared with contrastive learning based methods, such as MoCo V2, SimCLR V2, BYOL, AdCo, our generative based methods improves the cross-domain generalization tasks by +3.98% and +2.42% for DomainNet and +8.07% and +0.23% for PACS on 1% and 5% fraction setting respectively, which is tested by linear evaluation.…”
Section: Resultsmentioning
confidence: 84%
See 1 more Smart Citation
“…4 (PACS). In this setting, our DiMAE achieves a better performance than previous works on most tasks and gets significant gains over DIUL and other SSL methods on overall and average accuracy 3 . Compared with contrastive learning based methods, such as MoCo V2, SimCLR V2, BYOL, AdCo, our generative based methods improves the cross-domain generalization tasks by +3.98% and +2.42% for DomainNet and +8.07% and +0.23% for PACS on 1% and 5% fraction setting respectively, which is tested by linear evaluation.…”
Section: Resultsmentioning
confidence: 84%
“…In this paper, we tackle the self-supervised learning from multi-domain data from a different perspective, i.e., generative self-supervised learning, and propose a new Domain invariant Masked AutoEncoders (DiMAE) for learning domain-invariant features from multi-domain data, which is motivated by the recent generative-based self-supervised learning method Masked Auto-Encoders (MAE) [17]. Specifically, MAE eliminates the low-level information by masking large portion of image patches and drives the encoder to extract semantic information by reconstructing pixels from very few neighboring patches [3] with a light-weighted decoder. However, this design does not take the domain gaps into consideration and thus can not generalize well for the self-supervised learning from multi-domain tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Unsupervised learning [75] is a potential solution to the difficult and expensive training necessary for data acquisition. MAE, a recently developed unsupervised learning model, has been widely recognized for its powerful data reconstruction ability and application prospect [76], [77]. Few have attempted to apply MAE for urban NPP estimation and existing MAE models cannot solve time series problems.…”
Section: ) Npp Functionmentioning
confidence: 99%
“…[1,32] use a simple method to reconstruct the original image, and also learn rich features effectively. [28] gives a mathematical understanding of MAE. MSN [34], which is a concurrent work of ours, also discusses the invariance to mask.…”
Section: Related Workmentioning
confidence: 99%
“…Our motivation is, even though MIM obtains great success, it is still an open question how it works. Several works try to interpret MIM from different views, for example, [1] suggests MIM model learns "rich hidden representation" via reconstruction from masked images; afterwards, [28] gives a mathematical understanding for MAE [1]. However, what the model learns is still not obvious.…”
Section: Introductionmentioning
confidence: 99%