2020
DOI: 10.48550/arxiv.2004.10171
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…The explicated Knowledge Distillation framework has shown its efficiency in a tremendous number of tasks, such as Neural Machine Translation (Tan et al 2019;Wang et al 2021;Li and Li 2021;Sun et al 2020), Question Answering (Hu et al 2018;Arora, Khapra, and Ramaswamy 2019;Yang et al 2020b), Image Classification (Yang et al 2020a;Chen, Chang, and Lee 2018;Fu et al 2020), etc. Nonetheless, its application for Neural Cross-Lingual Summarization has received little interest.…”
Section: Background Neural Cross-lingual Summarizationmentioning
confidence: 99%
“…The explicated Knowledge Distillation framework has shown its efficiency in a tremendous number of tasks, such as Neural Machine Translation (Tan et al 2019;Wang et al 2021;Li and Li 2021;Sun et al 2020), Question Answering (Hu et al 2018;Arora, Khapra, and Ramaswamy 2019;Yang et al 2020b), Image Classification (Yang et al 2020a;Chen, Chang, and Lee 2018;Fu et al 2020), etc. Nonetheless, its application for Neural Cross-Lingual Summarization has received little interest.…”
Section: Background Neural Cross-lingual Summarizationmentioning
confidence: 99%
“…While the focus was originally on single-label image classification, KD has also been extended to the multi-label setting (Liu et al, 2018b). In NLP, KD has usually been applied in supervised settings (Kim and Rush, 2016;Huang et al, 2018;Yang et al, 2020), but also in some unsupervised tasks (usually using an unsupervised teacher for a supervised student) Sun et al, 2020). Xu et al (2018) use word embeddings jointly learned with a topic model in a procedure they term distillation, but do not follow the method from Hinton et al (2015) that we employ (instead opting for joint-learning).…”
Section: Related Workmentioning
confidence: 99%
“…• We do not have any parallel data among any of the language pairs, as considered in (Liu et al, 2020;Sun et al, 2020).…”
Section: Terminologymentioning
confidence: 99%