Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1325
|View full text |Cite
|
Sign up to set email alerts
|

Document-Level Neural Machine Translation with Hierarchical Attention Networks

Abstract: Neural Machine Translation (NMT) can be improved by including document-level contextual information. For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. The model is integrated in the original NMT architecture as another level of abstraction, conditioning on the NMT model's own previous hidden states. Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in contex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
297
0
2

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 207 publications
(317 citation statements)
references
References 18 publications
(28 reference statements)
3
297
0
2
Order By: Relevance
“…Experiments show that our approach improves upon the Transformer by an overall +1.34, +2.06 and +1.18 BLEU for TED Talks, News-Commentary and Europarl, respectively. It also outperforms two recent context-aware baselines Miculicich et al, 2018) in majority of the cases.…”
Section: Introductionmentioning
confidence: 75%
See 1 more Smart Citation
“…Experiments show that our approach improves upon the Transformer by an overall +1.34, +2.06 and +1.18 BLEU for TED Talks, News-Commentary and Europarl, respectively. It also outperforms two recent context-aware baselines Miculicich et al, 2018) in majority of the cases.…”
Section: Introductionmentioning
confidence: 75%
“…Training The document-conditioned NMT model P θ (y j |x j , D −j ) is realised using a neural architecture and usually trained via a two-step procedure Miculicich et al, 2018). The first step involves pre-training a standard sentence-level NMT model, and the second step involves optimising the parameters of the whole model, i.e., both the document-level and the sentence-level parameters.…”
Section: Document-level Machine Translationmentioning
confidence: 99%
“…Main Results Table 2 shows that our model surpasses all the context-agnostic (Vaswani et al, 2017) and context-aware (Zhang et al, 2018a;Miculicich et al, 2018;Maruf and Haffari, 2018) baselines on TED and Europarl datasets. For TED dataset, the performance of our model greatly exceeds that of all other baselines, and is better than Miculicich et al (2018) with a gain of +0.59 BLEU and +0.61 Meteor.…”
Section: Results and Analysismentioning
confidence: 90%
“…Main Results Table 2 shows that our model surpasses all the context-agnostic (Vaswani et al, 2017) and context-aware (Zhang et al, 2018a;Miculicich et al, 2018;Maruf and Haffari, 2018) baselines on TED and Europarl datasets. For TED dataset, the performance of our model greatly exceeds that of all other baselines, and is better than Miculicich et al (2018) with a gain of +0.59 BLEU and +0.61 Meteor. For Europarl dataset, our model got improvements with a gain of +0.07 on BLEU metric, but the Meteor score is +0.64 higher than Maruf et al (2019) which utilize the whole document as the contextual information, whereas we only using 3 previous sentences.…”
Section: Results and Analysismentioning
confidence: 90%
See 1 more Smart Citation