Proceedings of the Second Workshop on Subword/Character LEvel Models 2018
DOI: 10.18653/v1/w18-1207
|View full text |Cite
|
Sign up to set email alerts
|

Meaningless yet meaningful: Morphology grounded subword-level NMT

Abstract: We explore the use of two independent subsystems, namely Byte Pair Encoding (BPE) and Morfessor as basic units for subword-level neural machine translation (NMT). We have shown that for linguistically distant language-pairs Morfessor-based segmentation algorithm produces significantly better quality translation than BPE. However, for close language-pairs BPE-based subword-NMT may translate better than Morfessor-based subword-NMT. We have proposed a combined approach of these two segmentation algorithms Morfess… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
23
0
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 48 publications
(27 citation statements)
references
References 9 publications
3
23
0
1
Order By: Relevance
“…This could be one of the reasons why the BPE-based NMT model is found to be underperforming in this translation task. This finding is corroborated by Banerjee and Bhattacharyya[55] who in their work found that the Morfessor-based segmentation can yield better translation quality than the BPE-based segmentation for linguistically distant language-pairs, and other way round for the close language-pairs.…”
supporting
confidence: 65%
“…This could be one of the reasons why the BPE-based NMT model is found to be underperforming in this translation task. This finding is corroborated by Banerjee and Bhattacharyya[55] who in their work found that the Morfessor-based segmentation can yield better translation quality than the BPE-based segmentation for linguistically distant language-pairs, and other way round for the close language-pairs.…”
supporting
confidence: 65%
“…This could be one of the reasons why the BPE-based NMT model was found to be underperforming in this translation task. This finding was corroborated by Banerjee and Bhattacharyya [60], who in their work found that the Morfessor-based segmentation could yield better translation quality than the BPE-based segmentation for linguistically distant language pairs, and the other way round for close language pairs.…”
Section: The Bpe Segmentation On the Hindi-to-tamil Translationsupporting
confidence: 59%
“…However, these three papers only apply segmentation on the string level and cannot properly handle fusional morphology. Addressing morphology in NMT, Banerjee and Bhattacharyya (2018) combine BPE with a morphological analyzer to "guide" the segmentation of surface forms into substrings. Their approach does not result in morphemes, for example googling → googl|ing, which does not match with google, while in our work we match such morphemes.…”
Section: Related Workmentioning
confidence: 99%