2021
DOI: 10.1145/3441691
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Document-level Neural Machine Translation

Abstract: Machine translation (MT) is an important task in natural language processing (NLP), as it automates the translation process and reduces the reliance on human translators. With the resurgence of neural networks, the translation quality surpasses that of the translations obtained using statistical techniques for most language-pairs. Up until a few years ago, almost all of the neural translation models translated sentences independently , without incorporating the wider documen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 71 publications
(52 citation statements)
references
References 96 publications
0
35
0
3
Order By: Relevance
“…Context-aware Machine Translation There have been many works in the literature that try to incorporate context into NMT systems. Tiedemann and Scherrer (2017) first proposed the simple approach of concatenating the previous sentences in both the source and target side to the input to the system; Jean et al (2017), Bawden et al (2018), and used an additional contextspecific encoder to extract contextual features from the previous sentences; Maruf and Haffari (2018) and Tu et al (2018b) For a more detailed overview, Maruf et al (2019b) extensively describe the different approaches and how they leverage context. While these models lead to improvements with small training sets, Lopes et al (2020) showed that the improvements are negligible when compared with the concatenation baseline when using larger datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Context-aware Machine Translation There have been many works in the literature that try to incorporate context into NMT systems. Tiedemann and Scherrer (2017) first proposed the simple approach of concatenating the previous sentences in both the source and target side to the input to the system; Jean et al (2017), Bawden et al (2018), and used an additional contextspecific encoder to extract contextual features from the previous sentences; Maruf and Haffari (2018) and Tu et al (2018b) For a more detailed overview, Maruf et al (2019b) extensively describe the different approaches and how they leverage context. While these models lead to improvements with small training sets, Lopes et al (2020) showed that the improvements are negligible when compared with the concatenation baseline when using larger datasets.…”
Section: Related Workmentioning
confidence: 99%
“…According to the length, sentences are divided into three categories: short sentences (1∼9 words), medium-length sentences (10∼25 words), and long sentences (more than 25 words). In the constructed corpora, such as the FLOB Corpus, the average sentence length of the English text is 26.26 words, and in the Brown Corpus, the average sentence length of the English text is 32.48 words, which has reached the definition of the length of long sentences [5]. Taking long news sentences as an example, in the book news reporting and writing, it is considered that short sentences contain insufficient information and are easy to cause ambiguity, while long news sentences are difficult to understand, so it is best to keep the sentence of news introduction within 35 words.…”
Section: Definition Of An English Long Statementmentioning
confidence: 99%
“…We also evaluate the models from the viewpoint of interpolation, which we define as the ability to generate tokens whose lengths are seen during training. Specifically, we evaluate interpolation using long sequences since, first, the generation of long sequences is an important research topic in NLP (Zaheer et al, 2020;Maruf et al, 2021) and second, in datasets with long sequences, the position distribution of each token becomes increasingly sparse. In other words, tokens in the validation and test sets become unlikely to be observed in the training set at corresponding positions; we expect that shift invariance is crucial for addressing such position sparsity.…”
Section: (Iii) Interpolatementioning
confidence: 99%