2021
DOI: 10.48550/arxiv.2111.15340
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Sara Atito,
Muhammad Awais,
Ammarah Farooq
et al.

Abstract: Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Superv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 37 publications
0
4
0
Order By: Relevance
“…Both SIMMIM and MAE used ViT-B [13] model for pretraining using Imagenet-1K and finetuned on Imagenet-1K using classification labels. SIMMIM achieved 83.8% by pretraining for 800 epochs while MAE obtained marginally lower performance of 83.6% while requiring twice as many epochs 3 . Another difference is the so called decoder for transformers.…”
Section: Comparison With Post Artmentioning
confidence: 97%
See 1 more Smart Citation
“…Both SIMMIM and MAE used ViT-B [13] model for pretraining using Imagenet-1K and finetuned on Imagenet-1K using classification labels. SIMMIM achieved 83.8% by pretraining for 800 epochs while MAE obtained marginally lower performance of 83.6% while requiring twice as many epochs 3 . Another difference is the so called decoder for transformers.…”
Section: Comparison With Post Artmentioning
confidence: 97%
“…Two notable extensions of GMML are MC-SSL [3] and iBOT [4]. Both are generalisations of the notion of GMML to non-autoencoder based learning tasks and achieved remarkable performance.…”
Section: Comparison With Post Artmentioning
confidence: 99%
“…Transformers [29] have shown great success in various Natural Language Processing (NLP) and Computer Vision (CV) tasks [30][31][32][33][34][35][36][37] and are the basis of our proposed framework. Vision transformer [38] The transformer encoder consists of L consecutive Multi-head Self-Attention (MSA) and Multi-Layer Perceptron (MLP) blocks.…”
Section: Vision Transformermentioning
confidence: 99%
“…In this paper, we follow MAE [19] to adopt the most simple and intuitive raw pixels regression. In terms of masking strategies, SiT [2], MC-SSL0.0 [1] and BeiT [3] use a block-wise masking strategy, where a block of neighbouring tokens arranged spatially are masked. MAE [19] and SimMIM [37] use random masking with a large masked patch size or a large proportion of masked patches.…”
Section: Related Workmentioning
confidence: 99%