Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation

Tan, Xin; Zhang, Longyin; Xiong, Deyi; Zhou, Guodong

doi:10.18653/v1/d19-1168

Cited by 47 publications

(66 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Researchers propose various context-aware networks to utilize contextual information to improve the performance of DocNMT models on the translation quality (Jean et al, 2017;Tu et al, 2018;Kuang et al, 2018) or discourse phenomena (Bawden et al, 2018;Voita et al, 2019b,a). However, most methods roughly leverage all context sentences in a fixed size that is tuned on development sets (Wang et al, 2017;Miculicich et al, 2018;Yang et al, 2019;Xu et al, 2020) , or full context in the entire document (Maruf and Haffari, 2018;Tan et al, 2019;Kang and Zong, 2020;Zheng et al, 2020). They ignore the individualized needs for context when translating different source sentences.…”

Section: Related Workmentioning

confidence: 99%

“…Majority of existing DocNMT models set the context size or scope to be fixed. They utilize all of the previous k context sentences Miculicich et al, 2018;Voita et al, 2019b;Yang et al, 2019;Xu et al, 2020), or the full context in the entire document (Maruf and Haffari, 2018;Tan et al, 2019;Zheng et al, 2020). As a result, the inadequacy or redundancy of contextual information is almost inevitable.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Kang¹,

Zhao²,

Zhang³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context so that the document-level translation model can utilize the more useful selected context sentences to produce better translations. Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence. Then, we propose two strategies to explicitly select a variable number of context sentences and feed them into the translation module. We train the two modules end-to-end via reinforcement learning. A novel reward is proposed to encourage the selection and utilization of dynamic context sentences. Experiments demonstrate that our approach can select adaptive context sentences for different source sentences, and significantly improves the performance of document-level translation methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Kang¹,

Zhao²,

Zhang³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Large-context encoder-decoder models: Large-context encoderdecoder models that can capture long-range linguistic contexts beyond sentence boundaries or utterance boundaries have received significant attention in E2E-ASR [7,8], machine translation [14,15], and some natural language generation tasks [16,17]. In recent studies, transformer-based large-context encoder-decoder models have been introduced in machine translation [18,19]. In addition, a fully transformer-based hierarchcal architecture similar to our transformer [20].…”

Section: Related Workmentioning

confidence: 99%

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

Masumura

Makishima

Ihori

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utterance boundaries, well handle a sequence of utterances such as discourses and conversations. However, the transformer architecture, which has recently achieved state-of-the-art ASR performance among utterance-level ASR systems, has not yet been introduced into the large-context ASR systems. We can expect that the transformer architecture can be leveraged for effectively capturing not only input speech contexts but also long-range sequential contexts beyond utterance boundaries. Therefore, this paper proposes a hierarchical transformer-based large-context E2E-ASR model that combines the transformer architecture with hierarchical encoder-decoder based large-context modeling. In addition, in order to enable the proposed model to use long-range sequential contexts, we also propose a large-context knowledge distillation that distills the knowledge from a pre-trained large-context language model in the training phase. We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.

show abstract

“…Voita et al [2019c] propose the CADec which demonstrates major gains over a context-agnostic baseline on their benchmarks without sacrificing BLEU. Tan et al [2019] propose a hierarchical model consisting of a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level information.…”

Section: Related Workmentioning

confidence: 99%

“…Despite the great success of these sequence-to-sequence models, they translate in a sentence-by-sentence manner, utilizing a large amount of sentence-level parallel data, while totally ignoring extra-sentential context information and intersentence consistency. This issue has attracted wide attention to context-aware translation recently, and many contextaware translation approaches [Wang et al, 2017;Tiedemann and Scherrer, 2017;Bawden et al, 2018;Voita et al, 2018;Maruf and Haffari, 2018;Kuang et al, 2018;Kuang and Xiong, 2018;Läubli et al, 2018;Miculicich et al, 2018;Voita et al, 2019c;Voita et al, 2019b;Xiong et al, 2019;Tan et al, 2019] are proposed.…”

Section: Introductionmentioning

confidence: 99%

Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Xiong

Genabith

et al. 2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.

show abstract

Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation

Cited by 47 publications

References 16 publications

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Contact Info

Product

Resources

About