Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

Liu, Zhili; Han, Jianhua; Hong, Liang; Xu, Hang; Chen, Kai; Xu, Chunjing; Li, Zhenguo

doi:10.1609/aaai.v36i2.20079

Cited by 5 publications

(6 citation statements)

References 29 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…lower performance of fine-tuned models compared to models trained from scratch on the new experimental data. Negative transfer has also been observed in other fields (Liu et al, 2022), and indicates that care must be taken when choosing pretraining tasks.…”

Section: Evaluation Of Training Strategiesmentioning

confidence: 85%

“…Ji et al (2021); Mo et al (2021); Benegas et al (2022); Zeng et al (2023)). Pretraining using task-relevant data can improve the performance of fine-tuned models (Gururangan et al, 2020), while pretraining using irrelevant data can hurt performance (Liu et al, 2022). For our application, there are many datasets that are closely related to promoter-driven expression, including MPRAs and endogenous gene expression datasets, as well as TF-binding data that may help models learn relevant sequence motifs that regulate expression when present in promoters.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Strategies for effectively modelling promoter-driven gene expression using transfer learning

Reddy

Herschl

Kolli

et al. 2023

Preprint

View full text Add to dashboard Cite

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.

show abstract

Section: Evaluation Of Training Strategiesmentioning

confidence: 85%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Strategies for effectively modelling promoter-driven gene expression using transfer learning

Reddy

Herschl

Kolli

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, generative-based SSL such as the Masked autoencoder (He et al, 2021) learns to reconstruct images with only a small fraction of the pixels. Additionally, contrastive learning methods have been observed to suffer from the negative transfer phenomenon (Liu et al, 2022), where the learned features perform poorly on downstream tasks. Furthermore, the use of cosine similarity in contrastive learning has been noted to result in overly complex feature maps (Hu et al, 2022a), which can negatively impact out-of-distribution generalization.…”

Section: Observationsmentioning

confidence: 99%

Decomposed adversarial domain generalization

Chen

2023

Knowledge-Based Systems

View full text Add to dashboard Cite

“…For data that is already split at subject level, we can apply self-supervised learning directly [459]. When the data is not split, we can apply clustering to find subgroups in the data to apply self-supervised learning on [460,461]. Zhou et al [462] proposed a personalized search model which learns personalized preference based on user-level contrastive learning (e.g., users that click on the same document form a positive pair).…”

Section: Personalized Modelsmentioning

confidence: 99%

Self-supervised learning for early detection of neurodegenerative diseases with small data

Jiang¹

View full text Add to dashboard Cite

would also like to thank co-supervisor Prof. Chin and mentor Prof. Yu for sharing their knowledge and improving me as a researcher. This journey would have been impossible without their guidance and supervision.My sincere gratitude to my friends, Bryan, Jonathan, and Han Yue for providing peer-support and livening up the journey during the tough times. The memories made over the different seasons will be fondly remembered. I would also like to thank the staff of LILY and the Alibaba-NTU JRI who have helped me along the way. Special thanks to Huang Bo for providing the technical support with servers and Zhiwei for spending time to go through the initial outline of this thesis.Last but not least, my eternal gratitude to my family for their unwavering support since day one of this journey. vii

show abstract

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing

Cited by 5 publications

References 29 publications

Strategies for effectively modelling promoter-driven gene expression using transfer learning

Strategies for effectively modelling promoter-driven gene expression using transfer learning

Decomposed adversarial domain generalization

Self-supervised learning for early detection of neurodegenerative diseases with small data

Contact Info

Product

Resources

About