Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP) 2014
DOI: 10.3115/v1/w14-5904
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Identification of Arabic Language Varieties and Dialects in Social Media

Abstract: Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
40
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(42 citation statements)
references
References 10 publications
(2 reference statements)
0
40
0
Order By: Relevance
“…2 Examples of Arabic dialect identification on speech data include the work by ), Biadsy (2011), and Bahari et al (2014. Identifying Arabic dialects in text also became a popular research topic in recent years with several studies published about it (Zaidan and Callison-Burch, 2014;Sadat et al, 2014;Tillmann et al, 2014;Malmasi et al, 2015).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…2 Examples of Arabic dialect identification on speech data include the work by ), Biadsy (2011), and Bahari et al (2014. Identifying Arabic dialects in text also became a popular research topic in recent years with several studies published about it (Zaidan and Callison-Burch, 2014;Sadat et al, 2014;Tillmann et al, 2014;Malmasi et al, 2015).…”
Section: Related Workmentioning
confidence: 99%
“…user-generated content) and pose a number of challenges for NLP applications. Several studies on dialectal variation of Arabic have been published including corpus compilation for Arabic dialects (Al-Sabbagh and Girju, 2012;Cotterell and Callison-Burch, 2014), parsing (Chiang et al, 2006), machine translation of Arabic dialects (Zbib et al, 2012), and finally, the topic of the ADI shared task, Arabic dialect identification (Zaidan and Callison-Burch, 2014;Sadat et al, 2014;Malmasi et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…They collected 1,000 news articles and applied different features such as word and character n-grams to them. Similarly, in [21] the authors differentiate between six different varieties of Arabic in blogs and forums using character n-gram features. Concerning Spanish language varieties, in [9] the authors collected a dataset from Twitter, focusing on varieties from Argentina, Chile, Colombia, Mexico and Spain.…”
Section: Related Workmentioning
confidence: 99%
“…The main motive of MT is to make a language L 1 intelligible to whom who do not speak it by presenting it in a language L 2 , which might be the audiences' own language or a language which they are able to understand. However, there are several languages such as Chinese, Arabic, and Kurdish that encompass several dialects which are mutually unintelligible (Tang et al, 2008;Farghaly and Shaalan, 2009;Sadat et al, 2014). In this respect, the translation between the dialects are of the intralanguage nature rather than interlanguage.…”
Section: Introductionmentioning
confidence: 99%