Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) 2014
DOI: 10.3115/v1/w14-3603
|View full text |Cite
|
Sign up to set email alerts
|

Building a Corpus for Palestinian Arabic: a Preliminary Study

Abstract: This paper presents preliminary results in building an annotated corpus of the Palestinian Arabic dialect. The corpus consists of about 43K words, stemming from diverse resources. The paper discusses some linguistic facts about the Palestinian dialect, compared with the Modern Standard Arabic, especially in terms of morphological, orthographic, and lexical variations, and suggests some directions to resolve the challenges these differences pose to the annotation goal. Furthermore, we present two pilot studies … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
38
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 42 publications
(50 citation statements)
references
References 10 publications
(5 reference statements)
3
38
0
Order By: Relevance
“…CALIMA EGY performs much better than SAMA which is also consistent with previous results (Khalifa et al, 2016;Jarrar et al, 2014). CALIMA GLF outperforms both SAMA and CALIMA EGY on all measured conditions.…”
Section: Resultssupporting
confidence: 82%
See 2 more Smart Citations
“…CALIMA EGY performs much better than SAMA which is also consistent with previous results (Khalifa et al, 2016;Jarrar et al, 2014). CALIMA GLF outperforms both SAMA and CALIMA EGY on all measured conditions.…”
Section: Resultssupporting
confidence: 82%
“…proposed a Conventional Orthography for Dialectal Arabic (CODA) as part of a solution allowing different researchers to agree on a set of DA orthographic conventions for computational purposes. CODA was first defined for EGY, but has been extended to Palestinian, Tunisian, Algerian, Maghrebi and Gulf Arabic (Jarrar et al, 2014;Zribi et al, 2014;Saadane and Habash, 2015;Turki et al, 2016;Khalifa et al, 2016). We follow the conventions defined by Khalifa et al (2016) for CODA GLF.…”
Section: Dialectal Orthographymentioning
confidence: 99%
See 1 more Smart Citation
“…Habash et al (2012) proposed a Conventional Orthography for Dialectal Arabic (or CODA) targeting Egyptian Arabic for computational modeling purposes and demonstrated how to map to it in and (Pasha et al, 2014;. CODAs for other dialects have also been proposed (Zribi et al, 2014;Jarrar et al, 2014). In our current annotation task we neither address dialectal Arabic spelling normalization , nor do we systematically translate dialectal words into Standard Arabic (Salloum and Habash, 2013).…”
Section: Dialectal Usage Errorsmentioning
confidence: 99%
“…Text in CODA can be read perfectly in DA given the specific dialect and its CODA map. CODA has been designed for the Egyptian Dialect [7] as well as the Tunisian Dialect [15] and the Palestinian Levantine Dialect [9]. For a full presentation of CODA and an explanation of its choices, see ( [7], [15]).…”
Section: Codamentioning
confidence: 99%