2022
DOI: 10.48550/arxiv.2208.04164
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Abstract: Recently, Masked Image Modeling (MIM) achieves great success in self-supervised visual recognition. However, as a reconstruction-based framework, it is still an open question to understand how MIM works, since MIM appears very different from previous well-studied siamese approaches such as contrastive learning. In this paper, we propose a new viewpoint: MIM implicitly learns occlusion-invariant features, which is analogous to other siamese methods while the latter learns other invariance. By relaxing MIM formu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 37 publications
0
2
0
Order By: Relevance
“…Moreover, the assumption of consistency between time and frequency domain is used in [21]. To sum up, the objective of these contrastive learning methods is trying to learn features that are invariant to distortions of various augmented inputs [13], [20]. However, not only the data augmentation strategy requires many inductive biases but also the invariance assumption is not always existed.…”
Section: Self-supervision For Time Seriesmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, the assumption of consistency between time and frequency domain is used in [21]. To sum up, the objective of these contrastive learning methods is trying to learn features that are invariant to distortions of various augmented inputs [13], [20]. However, not only the data augmentation strategy requires many inductive biases but also the invariance assumption is not always existed.…”
Section: Self-supervision For Time Seriesmentioning
confidence: 99%
“…Among these current works, contrastive learning paradigm [19] has nearly become the most prevalent solutions. The common schemes underlying these contrastive methods are to learn embeddings that are invariant to distortions of various scale inputs with the cooperation of data augmentation and negative sampling strategies [20]. Though their effectiveness and prevalence, the invariance assumptions may not always hold in real-world scenarios.…”
Section: Introductionmentioning
confidence: 99%
“…It remains unknown what are actually learned by different SSL models. Recently, many works attempting to demystify SSL from both theoretical and empirical perspectives have emerged, especially for contrastive learning [204,[267][268][269][270][271] and masked image modeling [272][273][274][275][276][277] due to their state-of-the-art performances. We expect the explainability can help researchers better understand the properties and mechanisms behind existing SSL approaches, thus providing insightful guidance for future development.…”
Section: Discussionmentioning
confidence: 99%