ELLE: Efficient Lifelong Pre-training for Emerging Data

Qin, Yujia; Zhang, Jiajie; Lin, Yankai; Liu, Zhiyuan; Sun, Maosong; Zhou, Jie

doi:10.18653/v1/2022.findings-acl.220

Cited by 6 publications

(2 citation statements)

References 14 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, larger models require greater computational demands (Patterson et al, 2021). To this end, researchers propose to accelerate pre-training by mixed-precision training (Shoeybi et al, 2019), distributed training (Shoeybi et al, 2019), large batch optimization (You et al, 2020), etc. Another line of methods (Gong et al, 2019;Chen et al, 2022;Qin et al, 2022) proposes to pre-train larger PLMs progressively. They first train a small PLM, and then gradually increase the depth or width of the network based on parameter recycling (PR).…”

Section: Related Workmentioning

confidence: 99%

Knowledge Inheritance for Pre-trained Language Models

Qin¹,

Lin²,

Jiao³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Recent explorations of large-scale pre-trained language models (PLMs) have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training everlarger PLMs. However, it requires tremendous computational resources to train a largescale PLM, which may be practically unaffordable. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring that many well-trained PLMs are available. To this end, we explore the question how could existing PLMs benefit training large-scale PLMs in future. Specifically, we introduce a pre-training framework named "knowledge inheritance" (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs. Experimental results demonstrate the superiority of KI in training efficiency. We also conduct empirical analyses to explore the effects of teacher PLMs' pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI could be applied to domain adaptation and knowledge transfer. The implementation is publicly available at https://github.com/thunlp/ Knowledge-Inheritance.

show abstract

Section: Related Workmentioning

confidence: 99%

Knowledge Inheritance for Pre-trained Language Models

Qin¹,

Lin²,

Jiao³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…CPT is closely related to ELLE (Qin et al, 2022), which does continual pre-training. The key difference is that ELLE starts from random initialization, while our CPT starts from a pre-trained LM.…”

Section: Introductionmentioning

confidence: 99%

Continual Training of Language Models for Few-Shot Learning

Zixuan¹,

Lin²,

Shao³

et al. 2022

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual Post-Training), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness. 1

show abstract

KDCE: Effective Lifelong Learning for Code Pre-train Language Model

Feng

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ELLE: Efficient Lifelong Pre-training for Emerging Data

Cited by 6 publications

References 14 publications

Knowledge Inheritance for Pre-trained Language Models

Knowledge Inheritance for Pre-trained Language Models

Continual Training of Language Models for Few-Shot Learning

KDCE: Effective Lifelong Learning for Code Pre-train Language Model

Contact Info

Product

Resources

About