Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.250
|View full text |Cite
|
Sign up to set email alerts
|

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Abstract: Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), serving as a meta-learner to solve a group of similar NLP tasks for neural l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 39 publications
0
12
0
Order By: Relevance
“…One body of work approaches the problem by applying heuristic rules of perturbation to input sequences (Wallace et al, 2019;Jia and Liang, 2017;, while another uses neural models to construct adversarial examples (Li et al, 2020(Li et al, , 2018 or manipulate inputs in embedding space (Jin et al, 2020). Our work also contributes to efforts to understand impacts and outcomes of the fine-tuning process (Miaschi et al, 2020;Mosbach et al, 2020;Wang et al, 2020;Perez-Mayos et al, 2021).…”
Section: Related Workmentioning
confidence: 98%
“…One body of work approaches the problem by applying heuristic rules of perturbation to input sequences (Wallace et al, 2019;Jia and Liang, 2017;, while another uses neural models to construct adversarial examples (Li et al, 2020(Li et al, , 2018 or manipulate inputs in embedding space (Jin et al, 2020). Our work also contributes to efforts to understand impacts and outcomes of the fine-tuning process (Miaschi et al, 2020;Mosbach et al, 2020;Wang et al, 2020;Perez-Mayos et al, 2021).…”
Section: Related Workmentioning
confidence: 98%
“…Meta-learning algorithms have been applied in few-shot NLP tasks, such as text classification (Geng et al, 2020), relation extraction (Gao et al, 2019), question answering (Hua et al, 2020) and knowledge base completion (Sheng et al, 2020). Similar to Wang et al (2020a); ; , the proposed TransPrompt framework can be viewed as a combination of transfer learning and metalearning, which learns transferable knowledge from similar tasks to improve the performance of fewshot text classification, either for existing tasks or new tasks.…”
Section: Transfer Learning and Meta-learningmentioning
confidence: 99%
“…We employ standard BERT fine-tuning (Devlin et al, 2019) 7 , the LM-BFF prompting model (Gao et al, 2020) (with both manually-compiled and automaticallymined prompts) 8 and P-tuning (Liu et al, 2021) 9 (which produces state-of-the-art performance for PLM-based few-shot learning) as single-task baselines. Because we focus on learning knowledge across tasks, we also use the multi-task versions of BERT fine-tuning , LM-BFF (Gao et al, 2020) and P-tuning (Liu et al, 2021), and Meta Fine-tuning (Wang et al, 2020a) 10 as crosstask baselines. Specifically, we employ separate prompts (either discrete prompts or continuous prompt embeddings) for different tasks in the multitask versions of LM-BFF and P-tuning.…”
Section: Datasets and Experimental Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…This "knowledge transfer" technique in KD has been proved efficient only when two domains are close to each other (Hu et al 2019). In reality, however, it is highly risky as teachers of other domains may pass nontransferable knowledge to the student model, which is irrelevant to the current domain and hence harms the overall performance (Tan et al 2017;Wang et al 2020). Besides, current studies find multi-task fine-tuning of BERT does not necessarily yield better performance across all the tasks ( Sun et al 2019a).…”
Section: Introductionmentioning
confidence: 99%