Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
DOI: 10.18653/v1/k19-1031
|View full text |Cite
|
Sign up to set email alerts
|

On the Relation between Position Information and Sentence Length in Neural Machine Translation

Abstract: Long sentences have been one of the major challenges in neural machine translation (NMT). Although some approaches such as the attention mechanism have partially remedied the problem, we found that the current standard NMT model, Transformer, has difficulty in translating long sentences compared to the former standard, Recurrent Neural Network (RNN)-based model. One of the key differences of these NMT models is how the model handles position information which is essential to process sequential data. In this st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
40
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(42 citation statements)
references
References 17 publications
2
40
0
Order By: Relevance
“…They stated that this degradation in quality was caused by the short length of the translations. Additionally, Neishi and Yoshinaga (2019) propose to use the relative position information instead of the absolute position information to mitigate the performance drop of NMT models for long sentences. They conducted an analysis of the translation quality and sentence length on lengthcontrolled English-to-Japanese parallel data and showed that the absolute positional information sharply drops the BLEU score of the transformer model (Vaswani et al, 2017) in translating sentences that are longer than those in the training data.…”
Section: Related Workmentioning
confidence: 99%
“…They stated that this degradation in quality was caused by the short length of the translations. Additionally, Neishi and Yoshinaga (2019) propose to use the relative position information instead of the absolute position information to mitigate the performance drop of NMT models for long sentences. They conducted an analysis of the translation quality and sentence length on lengthcontrolled English-to-Japanese parallel data and showed that the absolute positional information sharply drops the BLEU score of the transformer model (Vaswani et al, 2017) in translating sentences that are longer than those in the training data.…”
Section: Related Workmentioning
confidence: 99%
“…They stated that this degradation in quality was caused by the short length of the translations. Additionally, Neishi and Yoshinaga (2019) propose to use the relative position information instead of the absolute position information to mitigate the performance drop of NMT models for long sentences. They conducted an analysis of the translation quality and sentence length on lengthcontrolled English-to-Japanese parallel data and showed that the absolute positional information sharply drops the BLEU score of the transformer model in translating sentences that are longer than those in the training data.…”
Section: Discussionmentioning
confidence: 99%
“…RPE outperforms APE on out-of-distribution data in terms of sequence length owing to its innate shift invariance (Rosendahl et al, 2019;Neishi and Yoshinaga, 2019;Narang et al, 2021;Wang et al, 2021). However, the self-attention mechanism of RPE involves more computation than that of APE 4 .…”
Section: Relative Position Embedding (Rpe)mentioning
confidence: 99%
“…RPE outperforms APE on sequence-to-sequence tasks (Narang et al, 2021;Neishi and Yoshinaga, 2019) due to extrapolation, i.e., the ability to generalize to sequences that are longer than those observed during training (Newman et al, 2020). Wang et al (2021) reported that one of the key properties contributing to RPE's superior performance is shift invariance 2 , the property of a function to not change its output even if its input is shifted.…”
Section: Introductionmentioning
confidence: 99%