Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Wang, Chengyu; Qiu, Minghui; Huang, Jun; He, Xiaofeng

doi:10.18653/v1/2020.emnlp-main.250

Cited by 15 publications

(12 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One body of work approaches the problem by applying heuristic rules of perturbation to input sequences (Wallace et al, 2019;Jia and Liang, 2017;, while another uses neural models to construct adversarial examples (Li et al, 2020(Li et al, , 2018 or manipulate inputs in embedding space (Jin et al, 2020). Our work also contributes to efforts to understand impacts and outcomes of the fine-tuning process (Miaschi et al, 2020;Mosbach et al, 2020;Wang et al, 2020;Perez-Mayos et al, 2021).…”

Section: Related Workmentioning

confidence: 98%

On the Interplay Between Fine-tuning and Composition in Transformers

Yu¹,

Ettinger²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Pre-trained transformer language models have shown remarkable performance on a variety of NLP tasks. However, recent research has suggested that phrase-level representations in these models reflect heavy influences of lexical content, but lack evidence of sophisticated, compositional phrase information (Yu and Ettinger, 2020). Here we investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information beyond lexical content. Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task. After fine-tuning, we analyze phrasal representations in controlled settings following prior work. We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models. In follow-up analyses, we identify confounding cues in the paraphrase dataset that may explain the lack of composition benefits from that task, and we discuss potential factors underlying the localized benefits from sentiment training.

show abstract

Section: Related Workmentioning

confidence: 98%

On the Interplay Between Fine-tuning and Composition in Transformers

Yu¹,

Ettinger²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Meta-learning algorithms have been applied in few-shot NLP tasks, such as text classification (Geng et al, 2020), relation extraction (Gao et al, 2019), question answering (Hua et al, 2020) and knowledge base completion (Sheng et al, 2020). Similar to Wang et al (2020a); ; , the proposed TransPrompt framework can be viewed as a combination of transfer learning and metalearning, which learns transferable knowledge from similar tasks to improve the performance of fewshot text classification, either for existing tasks or new tasks.…”

Section: Transfer Learning and Meta-learningmentioning

confidence: 99%

“…We employ standard BERT fine-tuning (Devlin et al, 2019) 7 , the LM-BFF prompting model (Gao et al, 2020) (with both manually-compiled and automaticallymined prompts) 8 and P-tuning (Liu et al, 2021) 9 (which produces state-of-the-art performance for PLM-based few-shot learning) as single-task baselines. Because we focus on learning knowledge across tasks, we also use the multi-task versions of BERT fine-tuning , LM-BFF (Gao et al, 2020) and P-tuning (Liu et al, 2021), and Meta Fine-tuning (Wang et al, 2020a) 10 as crosstask baselines. Specifically, we employ separate prompts (either discrete prompts or continuous prompt embeddings) for different tasks in the multitask versions of LM-BFF and P-tuning.…”

Section: Datasets and Experimental Settingsmentioning

confidence: 99%

“…In machine learning, the meta-learning paradigm is extensively studied, which produces models that are capable of being adapted to a group of similar tasks quickly with few learning steps (Wang et al, 2020c;Huisman et al, 2020). For PLMs, Wang et al (2020a) discover that training a meta-learner for PLMs is effective to capture the transferable knowledge across different domains. Yet, this method is not designed for prompts for few-shot learning and lacks the mechanism to learn unbiased representations for all the tasks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

Wang¹,

Wang²,

Qiu³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Recent studies have shown that prompts improve the performance of large pre-trained language models for few-shot text classification. Yet, it is unclear how the prompting knowledge can be transferred across similar NLP tasks for the purpose of mutual reinforcement. Based on continuous prompt embeddings, we propose TransPrompt, a transferable prompting framework for few-shot learning across similar tasks. In TransPrompt, we employ a multitask meta-knowledge acquisition procedure to train a meta-learner that captures cross-task transferable knowledge. Two de-biasing techniques are further designed to make it more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to target tasks with high accuracy. Extensive experiments show that TransPrompt outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance on previously unseen tasks. TransPrompt also outperforms strong fine-tuning baselines when learning with full training sets.

show abstract

“…This "knowledge transfer" technique in KD has been proved efficient only when two domains are close to each other (Hu et al 2019). In reality, however, it is highly risky as teachers of other domains may pass nontransferable knowledge to the student model, which is irrelevant to the current domain and hence harms the overall performance (Tan et al 2017;Wang et al 2020). Besides, current studies find multi-task fine-tuning of BERT does not necessarily yield better performance across all the tasks ( Sun et al 2019a).…”

Section: Introductionmentioning

confidence: 99%

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Pan¹,

Wang²,

Qiu³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time, limit the deployment of such models in real-time applications. Typical approaches consider knowledge distillation to distill large teacher models into small student models. However, most of these studies focus on single-domain only, which ignores the transferable knowledge from other domains. We argue that training a teacher with transferable knowledge digested across domains can achieve better generalization capability to help knowledge distillation. To this end, we propose a Meta-Knowledge Distillation (Meta-KD) framework to build a meta-teacher model that captures transferable knowledge across domains inspired by meta-learning and use it to pass knowledge to students. Specifically, we first leverage a crossdomain learning process to train the meta-teacher on multiple domains, and then propose a meta-distillation algorithm to learn single-domain student models with guidance from the meta-teacher. Experiments on two public multi-domain NLP tasks show the effectiveness and superiority of the proposed Meta-KD framework. We also demonstrate the capability of Meta-KD in both few-shot and zero-shot learning settings.

show abstract

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Cited by 15 publications

References 39 publications

On the Interplay Between Fine-tuning and Composition in Transformers

On the Interplay Between Fine-tuning and Composition in Transformers

TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Contact Info

Product

Resources

About