2019
DOI: 10.48550/arxiv.1909.12744
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the use of BERT for Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…Since the birth of BERT (Devlin et al, 2019), there has been continuing advancement in language model pre-training, such as XLNet , RoBERTa , ALBERT (Lan et al, 2019), UniLM , and T5 (Raffel et al, 2019), which epitomizes the superb power of large-scale pre-training. Satellited around BERT, there are also studies on model compression (Sun et al, 2019c;Jiao et al, 2019;Shen et al, 2019) and extension from understanding to generation (Chen et al, 2019a;Clinchant et al, 2019;Wang and Cho, 2019).…”
Section: Model Pre-trainingmentioning
confidence: 99%
“…Since the birth of BERT (Devlin et al, 2019), there has been continuing advancement in language model pre-training, such as XLNet , RoBERTa , ALBERT (Lan et al, 2019), UniLM , and T5 (Raffel et al, 2019), which epitomizes the superb power of large-scale pre-training. Satellited around BERT, there are also studies on model compression (Sun et al, 2019c;Jiao et al, 2019;Shen et al, 2019) and extension from understanding to generation (Chen et al, 2019a;Clinchant et al, 2019;Wang and Cho, 2019).…”
Section: Model Pre-trainingmentioning
confidence: 99%
“…Enhancing NMT systems with knowledge is a promising research direction in recent years. For example, in Lu et al (2018), , , knowledge graph is incorporate into machine translation task, and in Zhu et al (2020), Clinchant et al (2019), Yang et al (2020), Shavarani and Sarkar (2021), pre-trained language models are used to enhance the translation. In this work, named entity information is leveraged to boost the performance.…”
Section: Related Workmentioning
confidence: 99%
“…A successful NLP model must understand the structure and context of language, learned via supervised or unsupervised methods. Pretrained language models have been used to boost performance in other NLP tasks (Clinchant et al, 2019;Zhu et al, 2020), such as BERT (Devlin et al, 2018) achieving state-of-the-art performance. Zhu et al, 2020 tried to fuse the embedding of BERT into a traditional transformer architecture using attention, increasing the translation performance by approximately 2 BLEU score.…”
Section: Related Workmentioning
confidence: 99%