Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.542
|View full text |Cite
|
Sign up to set email alerts
|

Curriculum Learning for Natural Language Understanding

Abstract: With the great success of pre-trained language models, the pretrain-finetune paradigm now becomes the undoubtedly dominant solution for natural language understanding (NLU) tasks. At the fine-tune stage, target task data is usually introduced in a completely random order and treated equally. However, examples in NLU tasks can vary greatly in difficulty, and similar to human learning procedure, language models can benefit from an easy-to-difficult curriculum. Based on this idea, we propose our Curriculum Learni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 112 publications
(103 citation statements)
references
References 25 publications
0
83
0
Order By: Relevance
“…Curriculum Learning is a learning strategy firstly proposed by Bengio et al (2009) that trains a neural network better through increasing data complexity of training data. It is broadly adopted in many NLP domains (Platanios et al, 2019;Huang and Du, 2019;Xu et al, 2020). In this work, since data with rich related arguments is easier to be learned than those without extra inputs, we promote the training of our student model by gradually increasing the learning complexity of the distillation process by decreasing the proportion of given arguments.…”
Section: Related Workmentioning
confidence: 99%
“…Curriculum Learning is a learning strategy firstly proposed by Bengio et al (2009) that trains a neural network better through increasing data complexity of training data. It is broadly adopted in many NLP domains (Platanios et al, 2019;Huang and Du, 2019;Xu et al, 2020). In this work, since data with rich related arguments is easier to be learned than those without extra inputs, we promote the training of our student model by gradually increasing the learning complexity of the distillation process by decreasing the proportion of given arguments.…”
Section: Related Workmentioning
confidence: 99%
“…The above implementations lack thinking about the learning process. The process of human learning often goes from easy to difficult (Xu et al, 2020). Especially for the correlated tasks, humans can dig into the hidden knowledge and extract them from the easy tasks for completing the hard ones.…”
Section: Progressive Tasksmentioning
confidence: 99%
“…For Subtask 2, they include several tokens and embeddings based on document structure into input representation for BART. Instead of random order of the training instances, they propose to apply curriculum learning (Xu et al, 2020) based on the computed task difficulty level for each task respectively. The final submission on Subtask 2 is based on the span prediction by a single model.…”
Section: Ku Nlpmentioning
confidence: 99%