2021
DOI: 10.1609/aaai.v35i8.16910
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Abstract: Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of Natural Language Processing (NLP) tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 21 publications
(40 reference statements)
0
1
0
Order By: Relevance
“…Feng et al [ 7 ] proposed a method to learn to augment data-scarce BERT-domain knowledge distillation, learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains to tackle the problem of performance degradation due to data scarcity in the target domain. Aiming at the same problem, Ma et al [ 8 ] presented a novel two-step domain-adaptation framework based on curriculum learning and domain-discriminative data selection.…”
Section: Related Workmentioning
confidence: 99%
“…Feng et al [ 7 ] proposed a method to learn to augment data-scarce BERT-domain knowledge distillation, learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains to tackle the problem of performance degradation due to data scarcity in the target domain. Aiming at the same problem, Ma et al [ 8 ] presented a novel two-step domain-adaptation framework based on curriculum learning and domain-discriminative data selection.…”
Section: Related Workmentioning
confidence: 99%