Proceedings of the Fourth Arabic Natural Language Processing Workshop 2019
DOI: 10.18653/v1/w19-4602
|View full text |Cite
|
Sign up to set email alerts
|

Morphology-aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation

Abstract: Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, clitics often differ between MSA and DA. This paper compares morphologyaware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian A… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 10 publications
0
7
0
Order By: Relevance
“…Our work is closely related to their idea while ours are more simple and realizable. Tawfik et al (Tawfik et al, 2019) confirmed that there is some advantage from using a high accuracy dialectal segmenter jointly with a language independent word segmentation method like BPE. The main difference is that their approach needs sufficient monolingual data additionally to train a segmentation model while ours do not need any external resources, which is very convenient for word segmentation on the low-resource and morphologically-rich agglutinative languages.…”
Section: Related Workmentioning
confidence: 98%
“…Our work is closely related to their idea while ours are more simple and realizable. Tawfik et al (Tawfik et al, 2019) confirmed that there is some advantage from using a high accuracy dialectal segmenter jointly with a language independent word segmentation method like BPE. The main difference is that their approach needs sufficient monolingual data additionally to train a segmentation model while ours do not need any external resources, which is very convenient for word segmentation on the low-resource and morphologically-rich agglutinative languages.…”
Section: Related Workmentioning
confidence: 98%
“…Banerjee and Bhattacharyya (2018) combined an off-the-shelf morphological segmenter and BPE in Hindi and Bengali translations against English. Tawfik et al (2019) used a retrained version of linguistically motivated segmentation model along with statistical segmentation methods for Arabic. Pinnis et al (2017) adopted linguistic guidance to BPE for English-Latvian translation.…”
Section: Related Workmentioning
confidence: 99%
“…Several works on morphologically-rich NMT have focused on using morphological analysis to pre-process the training data (Luong et al, 2016;Huck et al, 2017;Tawfik et al, 2019). Gulcehre et al (2015) segment each Turkish sentence into a sequence of morpheme units and remove any nonsurface morphemes for Turkish-English translation.…”
Section: Related Workmentioning
confidence: 99%