2021
DOI: 10.48550/arxiv.2110.06014
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rethinking Supervised Pre-training for Better Downstream Transferring

Abstract: The pretrain-finetune paradigm has shown outstanding performance on many applications of deep learning, where a model is pre-trained on a upstream large dataset (e.g. ImageNet), and is then fine-tuned to different downstream tasks. Though for most cases, the pre-training stage is conducted based on supervised methods, recent works on self-supervised pre-training have shown powerful transferability and even outperform supervised pre-training on multiple downstream tasks. It thus remains an open question how to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 32 publications
0
10
0
Order By: Relevance
“…Through the quantitative studies, we can check the importance of the spatial attention layer and the data augmentation strategy. We achieve comparable results on the Kaokore dataset with the SOTA accuracy score of 89.04% after 90 epochs using the LOOK method (Feng et al, 2021) as compared to our system with 83.22% after 20 epochs and with a model that requires less training parameters. By changing the proportion of p 1 and p 2 , we can achieve 78.67% precision and 75.3% recall with a ResNet-50 (Shah and Harpale, 2018) backbone.…”
Section: Introductionmentioning
confidence: 71%
See 2 more Smart Citations
“…Through the quantitative studies, we can check the importance of the spatial attention layer and the data augmentation strategy. We achieve comparable results on the Kaokore dataset with the SOTA accuracy score of 89.04% after 90 epochs using the LOOK method (Feng et al, 2021) as compared to our system with 83.22% after 20 epochs and with a model that requires less training parameters. By changing the proportion of p 1 and p 2 , we can achieve 78.67% precision and 75.3% recall with a ResNet-50 (Shah and Harpale, 2018) backbone.…”
Section: Introductionmentioning
confidence: 71%
“…In models with smaller backbones like VGG-16 and VGG-19, the representative samples offer better augmentations, since the excessive visual variations in styles can hurt the model performance as seen in (Zheng et al, 2019). (Islam et al, 2021) 88.92% 47 M CE+SelfSupCon (Islam et al, 2021) 88.25% 27.9 M LOOK (ResNet-50) (Feng et al, 2021) 89 Table 3: A comparative study for the Kaokore dataset. Note that our data augmentation method can be used on top of all existing state-of-the-arts and boost their performance.…”
Section: Quantitative Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, such methods make all the instances of the same class collapse into a narrow area near the class center, which reduces the intra-class diversity. Zhao et al (2020); Feng et al (2021) find large intra-class diversity helps transfer knowledge to downstream tasks. Therefore, we relax the constraint by only limiting k-nearest neighbors close to each other.…”
Section: Knn Contrastive Pre-trainingmentioning
confidence: 99%
“…Although CE and SCL are effective for classifying known IND classes, such learned representations are poor for downstream transfer. Zhao et al (2020);Feng et al (2021) find larger intra-class diversity helps transfer. CE and SCL tend to pull all samples from the same class together to form a narrow intra-class distribution, thus ignore the intra-class diverse features, which makes the learned representations unfavorable to transfer to the downstream OOD clustering.…”
Section: Introductionmentioning
confidence: 97%