2020
DOI: 10.1007/978-3-030-60450-9_31
|View full text |Cite
|
Sign up to set email alerts
|

Incorporating Named Entity Information into Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 14 publications
1
16
0
Order By: Relevance
“…Bidirectional contextual representations like BERT come at the expense of being "true" language models P LM (W ), as there may appear no way to generate text (sampling) or produce sentence probabilities (density estimation) from these models. This handicapped their use in generative tasks, where they at best served to bootstrap encoder-decoder models (Clinchant et al, 2019;Zhu et al, 2020) or unidirectional LMs .…”
Section: Pseudolikelihood Estimationmentioning
confidence: 99%
See 1 more Smart Citation
“…Bidirectional contextual representations like BERT come at the expense of being "true" language models P LM (W ), as there may appear no way to generate text (sampling) or produce sentence probabilities (density estimation) from these models. This handicapped their use in generative tasks, where they at best served to bootstrap encoder-decoder models (Clinchant et al, 2019;Zhu et al, 2020) or unidirectional LMs .…”
Section: Pseudolikelihood Estimationmentioning
confidence: 99%
“…Existing uses of pretrained MLMs in sequenceto-sequence models for automatic speech recognition (ASR) or neural machine translation (NMT) involve integrating their weights (Clinchant et al, 2019) or representations (Zhu et al, 2020) into the encoder and/or decoder during training. In contrast, we train a sequence model independently, then rescore its n-best outputs with an existing MLM.…”
Section: Introductionmentioning
confidence: 99%
“…The original usage of BERT mainly focused on NLP tasks, ranging from token-level and sequence-level classification tasks, including question answering [9,10], document summarization [11,12], information retrieval [13,14], machine translation [15,16], just to name a few. There has also been attempts to combine BERT in ASR, including rescoring [17,18] or generating soft labels for training [19].…”
Section: Bertmentioning
confidence: 99%
“…Unfortunately, we did not observe good performance. For the second strategy, following the practice of [9], we use BERT to extract context-aware embeddings and fuse it into each layer of transformer encoder and decoder via an attention mechanism.…”
Section: Neural Itnmentioning
confidence: 99%
“…While RNNs are powerful for sequence to sequence tasks, transformer based models [8] offer pretraining abilities using vast amounts of data. However, incorporating pretrained models is not trivial and is often specific to the task [9].…”
Section: Introductionmentioning
confidence: 99%