Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2014
DOI: 10.3115/v1/p14-2034
|View full text |Cite
|
Sign up to set email alerts
|

Word Segmentation of Informal Arabic with Domain Adaptation

Abstract: Segmentation of clitics has been shown to improve accuracy on a variety of Arabic NLP tasks. However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect. We extend an existing MSA segmenter with a simple domain adaptation technique and new features in order to segment informal and dialectal Arabic text. Experiments show that o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 59 publications
(45 citation statements)
references
References 7 publications
0
44
1
Order By: Relevance
“…This adaptation scheme is attractive because of its simplicity and ease of use as a pre-processing step, and also because it performs quite well despite its simplicity. It has been used in various NLP tasks such as word segmentation (Monroe et al, 2014), machine translation , word sense disambiguation (Zhong et al, 2008), and short answer scoring (Heilman and Madnani, 2013). Our work is an extension of this scheme in the sense that our work is a generalization of EasyAdapt.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…This adaptation scheme is attractive because of its simplicity and ease of use as a pre-processing step, and also because it performs quite well despite its simplicity. It has been used in various NLP tasks such as word segmentation (Monroe et al, 2014), machine translation , word sense disambiguation (Zhong et al, 2008), and short answer scoring (Heilman and Madnani, 2013). Our work is an extension of this scheme in the sense that our work is a generalization of EasyAdapt.…”
Section: Domain Adaptationmentioning
confidence: 99%
“…For this task, we utilize the Stanford Word Segmenter (Monroe et al 2014). 19 The Arabic texts were POS tagged and parsed using the Stanford Arabic Parser.…”
Section: Arabicmentioning
confidence: 99%
“…D is the same baseline as Green et al 5 We tokenized the English with Stanford CoreNLP according to the Penn Treebank standard (Marcus et al, 1993), the Arabic with the Stanford Arabic segmenter (Monroe et al, 2014) according to the Penn Arabic Treebank standard (Maamouri et al, 2008), and the Chinese with the Stanford Chinese segmenter (Chang et al, 2008) according to the Penn Chinese Treebank standard (Xue et al, 2005). 6 Data sources: tune, MT023568; dev, MT04; dev-dom, domain adaptation dev set is MT04 and all wb and bn data from LDC2007E61; test1, MT09 (Ar-En) and MT12 (Zh-En); test2, Progress0809 which was revealed in the OpenMT 2012 evaluation; test3, MetricsMATR08-10.…”
Section: Resultsmentioning
confidence: 99%