Proceedings of the Fourth Arabic Natural Language Processing Workshop 2019
DOI: 10.18653/v1/w19-4615
|View full text |Cite
|
Sign up to set email alerts
|

Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan

Abstract: We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…NLP has several difficulties when working with dialectal Arabic. Arabic has a rich temporal affixal and inflectional morphology with several classes of attachable clitics (Al-Shargi et al, 2016). Arabic is the official language and is widely spoken in Yemeni official institutions (Al-Hamzi, 2021).…”
Section: The Yemeni Linguistic Situationmentioning
confidence: 99%
See 1 more Smart Citation
“…NLP has several difficulties when working with dialectal Arabic. Arabic has a rich temporal affixal and inflectional morphology with several classes of attachable clitics (Al-Shargi et al, 2016). Arabic is the official language and is widely spoken in Yemeni official institutions (Al-Hamzi, 2021).…”
Section: The Yemeni Linguistic Situationmentioning
confidence: 99%
“…Over the years, there has been a lot of descriptive work on the same subject. The number of theoretical and descriptive linguistic work on Yemeni Arabic was assessed by (Peter Behnstedt, 2017); (Jastrow, 1984); (Abu-Haidar, 1994); (Al-Shargi et al, 2016); (Naïm-Sanbar, 1994) and Behnstedt, 2006). Rubin (2018) examined the morphology of the Mehri Qishn dialect, one of the most widely-spoken dialects in Yemeni Arabic.…”
Section: Introductionmentioning
confidence: 99%
“…Al-Shargi et al . (2016) presented morphologically annotated corpora for Moroccan and Sanaani Yemeni Arabic. The corpora data were collected from both online and print materials such as internet comments, forums, oral interviews, folktales, sermons, textbooks, blogs and Facebook posts.…”
Section: Nlp Resources For Arabic Dialectsmentioning
confidence: 99%
“…Some of the earlier efforts worked on rule-based approaches to model dialectal morphology directly (Habash and Rambow, 2006;Habash et al, 2012), or exploiting existing MSA resources (Salloum and Habash, 2014). Later, a number of annotation efforts have led to the creation of varying sizes of dialectal annotated corpora following the style of the PATB (Maamouri et al, 2014;Jarrar et al, 2016;Al-Shargi et al, 2016;Alshargi et al, 2019). The created annotations supported models for dialectal Arabic analysis, disambiguation and tokenization building on the same successful approaches in MSA (Eskander et al, 2016a;Habash et al, 2013;Pasha et al, 2014;Zalmout and Habash, 2019).…”
Section: Dialectal Arabic Models Work On Dialectalmentioning
confidence: 99%