2022
DOI: 10.1609/aaai.v36i2.20079
|View full text |Cite
|
Sign up to set email alerts
|

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

Abstract: Self-supervised learning (SSL), especially contrastive methods, has raised attraction recently as it learns effective transferable representations without semantic annotations. A common practice for self-supervised pre-training is to use as much data as possible. For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance, observed from our extensive experiments. On the other hand, for existing SSL methods, it is burdensome and infeasible to us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 29 publications
(40 reference statements)
0
6
0
Order By: Relevance
“…lower performance of fine-tuned models compared to models trained from scratch on the new experimental data. Negative transfer has also been observed in other fields (Liu et al, 2022), and indicates that care must be taken when choosing pretraining tasks.…”
Section: Evaluation Of Training Strategiesmentioning
confidence: 85%
See 2 more Smart Citations
“…lower performance of fine-tuned models compared to models trained from scratch on the new experimental data. Negative transfer has also been observed in other fields (Liu et al, 2022), and indicates that care must be taken when choosing pretraining tasks.…”
Section: Evaluation Of Training Strategiesmentioning
confidence: 85%
“…Ji et al (2021); Mo et al (2021); Benegas et al (2022); Zeng et al (2023)). Pretraining using task-relevant data can improve the performance of fine-tuned models (Gururangan et al, 2020), while pretraining using irrelevant data can hurt performance (Liu et al, 2022). For our application, there are many datasets that are closely related to promoter-driven expression, including MPRAs and endogenous gene expression datasets, as well as TF-binding data that may help models learn relevant sequence motifs that regulate expression when present in promoters.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, generative-based SSL such as the Masked autoencoder (He et al, 2021) learns to reconstruct images with only a small fraction of the pixels. Additionally, contrastive learning methods have been observed to suffer from the negative transfer phenomenon (Liu et al, 2022), where the learned features perform poorly on downstream tasks. Furthermore, the use of cosine similarity in contrastive learning has been noted to result in overly complex feature maps (Hu et al, 2022a), which can negatively impact out-of-distribution generalization.…”
Section: Observationsmentioning
confidence: 99%
“…For data that is already split at subject level, we can apply self-supervised learning directly [459]. When the data is not split, we can apply clustering to find subgroups in the data to apply self-supervised learning on [460,461]. Zhou et al [462] proposed a personalized search model which learns personalized preference based on user-level contrastive learning (e.g., users that click on the same document form a positive pair).…”
Section: Personalized Modelsmentioning
confidence: 99%