Patient Knowledge Distillation for BERT Model Compression

Sun, Siqi; Cheng, Yu; Gan, Zhe; Liu, Jingjing

doi:10.18653/v1/d19-1441

Cited by 444 publications

(467 citation statements)

References 30 publications

Supporting

Mentioning

459

Contrasting

Order By: Relevance

“…Before the birth of BERT, KD had been applied to several specific tasks like machine translation (Kim and Rush, 2016;Tan et al, 2019) in NLP. While the recent studies of distilling large pre-trained models focus on finding general distillation methods that work on various tasks and are receiving more and more attention (Sanh et al, 2019;Jiao et al, 2019;Sun et al, 2019a;Tang et al, 2019;Clark et al, 2019;.…”

Section: Introductionmentioning

confidence: 99%

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Yang

Cui

Chen³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of supervised learning tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setting up of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configurations, we achieve results that are comparable with or even higher than the public distilled BERT models with similar numbers of parameters. 1

show abstract

Section: Introductionmentioning

confidence: 99%

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Yang

Cui

Chen³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

show abstract

“…Furthermore, using BERT for inference poses latency challenges in a production system. A promising direction of future work that we plan on investigating is leveraging distilled versions of BERT (Sun et al, 2019;Wang et al, 2020) for the task. Table 6 shows the results of the NN analysis.…”

Section: Discussionmentioning

confidence: 99%

ScopeIt: Scoping Task Relevant Sentences in Documents

Patra

Suryanarayanan²,

Fufa³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

View full text Add to dashboard Cite

A prominent problem faced by conversational agents working with large documents (Eg: emailbased assistants) is the frequent presence of information in the document that is irrelevant to the assistant. This in turn makes it harder for the agent to accurately detect intents, extract entities relevant to those intents and perform the desired action. To address this issue we present a neural model for scoping relevant information for the agent from a large document. We show that when used as the first step in a popularly used email-based assistant for helping users schedule meetings 1 , our proposed model helps improve the performance of the intent detection and entity extraction tasks required by the agent for correctly scheduling meetings: across a suite of 6 downstream tasks, by using our proposed method, we observe an average gain of 35% in precision without any drop in recall. Additionally, we demonstrate that the same approach can be used for component level analysis in large documents, such as signature block identification. * Equal Contribution 1 We use Hedwig in lieu of the actual persona of the agent throughout this paper This work is licensed under a Creative Commons Attribution 4.

show abstract

“…As such, we look into Knowledge Distillation (KD) (Hinton et al, 2015) to transfer the language modeling capability of BART while keeping its copying behavior. Transferring the language model of massive pre-trained models into smaller models has been of high interest recently (Sanh et al, 2019;Turc et al, 2020;Sun et al, 2019). Knowledge transfer to simple models has also been discussed in lesser extent (Tang et al, 2019;Mukherjee and Awadallah, 2019).…”

Section: Distilling Bartmentioning

confidence: 99%

Sound Natural: Content Rephrasing in Dialog Systems

Einolghozati

Gupta

Diedrick

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We introduce a new task of rephrasing for a more natural virtual assistant. Currently, virtual assistants work in the paradigm of intentslot tagging and the slot values are directly passed as-is to the execution engine. However, this setup fails in some scenarios such as messaging when the query given by the user needs to be changed before repeating it or sending it to another user. For example, for queries like 'ask my wife if she can pick up the kids' or 'remind me to take my pills', we need to rephrase the content to 'can you pick up the kids' and 'take your pills'. In this paper, we study the problem of rephrasing with messaging as a use case and release a dataset of 3000 pairs of original query and rephrased query. We show that BART, a pre-trained transformers-based masked language model with auto-regressive decoding, is a strong baseline for the task, and show improvements by adding a copy-pointer and copy loss to it. We analyze different tradeoffs of BART-based and LSTM-based seq2seq models, and propose a distilled LSTM-based seq2seq as the best practical model.

show abstract

Patient Knowledge Distillation for BERT Model Compression

Cited by 444 publications

References 30 publications

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

ScopeIt: Scoping Task Relevant Sentences in Documents

Sound Natural: Content Rephrasing in Dialog Systems

Contact Info

Product

Resources

About