2022 International Joint Conference on Neural Networks (IJCNN) 2022
DOI: 10.1109/ijcnn55064.2022.9892393
|View full text |Cite
|
Sign up to set email alerts
|

Effect of pre-training scale on intra- and inter-domain, full and few-shot transfer learning for natural and X-Ray chest images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…Even though the horizontal stride of the modified network is different, which can change the horizontal scale and appearance of features in deeper layers of the network, using existing ImageNet-pretrained weights is still a sensible initialization procedure. ImageNet pretraining has been shown to be consistently beneficial in a wide array of image classification tasks, some of which have different image dimensions, scales of objects appearing in the images, and even cover an entirely different domain of images than the ImageNet-1k dataset [56,57]. We empirically validate the contribution of pretraining for weight initialization in our experiments.…”
mentioning
confidence: 60%
“…Even though the horizontal stride of the modified network is different, which can change the horizontal scale and appearance of features in deeper layers of the network, using existing ImageNet-pretrained weights is still a sensible initialization procedure. ImageNet pretraining has been shown to be consistently beneficial in a wide array of image classification tasks, some of which have different image dimensions, scales of objects appearing in the images, and even cover an entirely different domain of images than the ImageNet-1k dataset [56,57]. We empirically validate the contribution of pretraining for weight initialization in our experiments.…”
mentioning
confidence: 60%
“…There are factors known to impact transfer that we could not test for PLMs due to a lack of public models or to computational expense. First, pretraining dataset is important, both in terms of distance between the pretraining and downstream task data domains (Cherti & Jitsev, 2022) and data size (Abnar et al, 2022). PLMs pretrain on large databases of natural sequences.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we only test linear probes on mean pooled representations to limit computational cost, but previous work shows that for many tasks finetuning the PLM end-to-end outperforms a linear probe or training a small neural net-work on top of the frozen pretrained weights (Dallago et al, 2021; Yang et al, 2022), and that mean-pooling is rarely optimal (Detlefsen et al, 2022; Goldman et al, 2022). In computer vision, models trained on different datasets (Cherti & Jitsev, 2022) and pretraining tasks (Grigg et al, 2021) exhibit different finetuning dynamics, and there is some evidence for this in proteins as well (Detlefsen et al, 2022).…”
Section: Discussionmentioning
confidence: 99%
“…We measured the perplexity of each prompt using a surrogate language model -the underlying hypothesis here being that less likely prompts occur less frequently (if at all) in the training data of the surrogate language model, and will thus incur higher perplexity. Comparing the perplexity scores with the actual frequencies of prompt tokens in LAION-5B (Schuhmann et al, 2022) is an interesting avenue for future work. Figure 3 displays scatter plots of the intrinsic dimension and perplexity of these prompts.…”
Section: Methodsmentioning
confidence: 99%