This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.
In this paper we address the following questions from our experience of the last two and a half years in developing a large-scale corpus of Arabic text annotated for morphological information, part-of-speech, English gloss, and syntactic structure: (a) How did we 'leapfrog' through the stumbling blocks of both methodology and training in setting up the Penn Arabic Treebank (ATB) annotation? (b) How did we reconcile the Penn Treebank annotation principles and practices with the Modern Standard Arabic (MSA) traditional and more recent grammatical concepts? (c) What are the current issues and nagging problems? (d) What has been achieved and what are our future expectations?
La fasciite nodulaire est une pseudotumeur fibroblastique de nature bénigne. Il sʼagit dʼune entité rare. La moitié des cas recensés dans la littérature intéresse les membres supérieurs. La région cervico-faciale est touchée dans 20 % des cas et la localisation à la cavité buccale est rare. Lʼoriginalité de cette lésion vient de sa croissance rapide et de sa richesse cellulaire simulant à tord un sarcome. On se propose de rapporter une observation originale rare de fasciite nodulaire intéressant la face interne de la joue et de préciser les particularités diagnostiques et thérapeutiques de cette pathologie à partir dʼune revue de la littérature. OBSERVATION Il sʼagissait du cas dʼune dame de 50 ans qui sʼest présentée à la consultation externe pour une sensation de corps étranger intra buccal que la patiente a associé rapidement à lʼapparition, un mois auparavant, dʼune
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. The University of Chicago Press and Comparative and International Education Society are collaborating with JSTOR to digitize, preserve and extend access to Comparative Education Review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.