We present MAGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. MAGEAD provides an analysis to a root+pattern representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects.
We describe the phenomenon of schwa-deletion in Hindi and how it is handled in the pronunciation component of a multilingual concatenative text-to-speech system. Each of the consonants in written Hindi is associated with an "inherent" schwa vowel which is not represented in the orthography. For instance, the Hindi word pronounced as [namak] ('salt') is represented in the orthography using the consonantal characters for [n], [m], and [k]. Two main factors complicate the issue of schwa pronunciation in Hindi. First, not every schwa following a consonant is pronounced within the word. Second, in multimorphemic words, the presence of a morpheme boundary can block schwa deletion where it might otherwise occur. We propose a model for schwa-deletion which combines a general purpose schwa-deletion rule proposed in the linguistics literature (Ohala, 1983), with additional morphological analysis necessitated by the high frequency of compounds in our database. The system is implemented in the framework of finite-state transducer technology.
This paper presents a computational model for nonlinear morphology with illustrations from Syriac and Arabic. The model is a multitiered one in that it allows for multiple lexical representations corresponding to the multiple tiers of autosegmental phonology. The model consists of three main components: (i) a lexicon, which is made of sublexica, with each sublexicon representing lexical material from a specific tier, (ii) a rewrite rules component that maps multiple lexical representations into one surface form and vice versa, and (iii) a morphotactic component that employs regular grammars. The system is finite-state in that lexica and rules can be represented by multitape finite-state machines.
This paper describes an algorithm for the compilation of a two (or more) level orthographic or phonological rule notation into finite state transducers. The notation is an alternative to the standard one deriving from Koskenniemi's work: it is believed to have some practical descriptive advantages, and is quite widely used, but has a different interpretation. Efficient interpreters exist for the notation, but until now it has not been clear how to compile to equivalent automata in a transparent way. The present paper shows how to do this, using some of the conceptual tools provided by Kaplan and Kay's regular relations calculus. * Supported by SERC studentship no. 92313384. † Supported by a Benefactors' Studentship from St John's College.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.