Document-Level Neural Machine Translation with Hierarchical Attention Networks

Miculicich, Lesly; Ram, Dhananjay; Pappas, Nikolaos; Henderson, James

doi:10.18653/v1/d18-1325

Cited by 207 publications

(317 citation statements)

References 18 publications

(28 reference statements)

Supporting

Mentioning

297

Contrasting

Unclassified

Order By: Relevance

“…Experiments show that our approach improves upon the Transformer by an overall +1.34, +2.06 and +1.18 BLEU for TED Talks, News-Commentary and Europarl, respectively. It also outperforms two recent context-aware baselines Miculicich et al, 2018) in majority of the cases.…”

Section: Introductionmentioning

confidence: 75%

“…Training The document-conditioned NMT model P θ (y j |x j , D −j ) is realised using a neural architecture and usually trained via a two-step procedure Miculicich et al, 2018). The first step involves pre-training a standard sentence-level NMT model, and the second step involves optimising the parameters of the whole model, i.e., both the document-level and the sentence-level parameters.…”

Section: Document-level Machine Translationmentioning

confidence: 99%

See 1 more Smart Citation

Selective Attention for Context-aware Neural Machine Translation

Maruf¹,

Martins

Haffari³

2019

Proceedings of the 2019 Conference of the North

119

170

View full text Add to dashboard Cite

Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document. Recent works in context-aware NMT consider only a few previous sentences as context and may not scale to entire documents. To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. We also propose single-level attention approaches based on sentence or word-level information in the context. The document-level context representation, produced from these attention modules, is integrated into the encoder or decoder of the Transformer model depending on whether we use monolingual or bilingual context. Our experiments and evaluation on English-German datasets in different document MT settings show that our selective attention approach not only significantly outperforms context-agnostic baselines but also surpasses context-aware baselines in most cases.

show abstract

Section: Introductionmentioning

confidence: 75%

Section: Document-level Machine Translationmentioning

confidence: 99%

Selective Attention for Context-aware Neural Machine Translation

Maruf¹,

Martins

Haffari³

2019

Proceedings of the 2019 Conference of the North

119

170

View full text Add to dashboard Cite

show abstract

Section: Results and Analysismentioning

confidence: 90%

“…Main Results Table 2 shows that our model surpasses all the context-agnostic (Vaswani et al, 2017) and context-aware (Zhang et al, 2018a;Miculicich et al, 2018;Maruf and Haffari, 2018) baselines on TED and Europarl datasets. For TED dataset, the performance of our model greatly exceeds that of all other baselines, and is better than Miculicich et al (2018) with a gain of +0.59 BLEU and +0.61 Meteor. For Europarl dataset, our model got improvements with a gain of +0.07 on BLEU metric, but the Meteor score is +0.64 higher than Maruf et al (2019) which utilize the whole document as the contextual information, whereas we only using 3 previous sentences.…”

Section: Results and Analysismentioning

confidence: 90%

See 1 more Smart Citation

Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation

Yang¹,

Zhang²,

Meng³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Context modeling is essential to generate coherent and consistent translation for Document-level Neural Machine Translations. The widely used method for document-level translation usually compresses the context information into a representation via hierarchical attention networks. However, this method neither considers the relationship between context words nor distinguishes the roles of context words. To address this problem, we propose a query-guided capsule networks to cluster context information into different perspectives from which the target translation may concern. Experiment results show that our method can significantly outperform strong baselines on multiple data sets of different domains.

show abstract