The relationship between orthography, phonology and morphology varies with different languages and writing systems. These relationships are by no means random. They follow rules, albeit with exceptions, even for relatively irregular languages like English. In this paper, we present the PolyOrth approach to representing these relationships, which definines orthographic forms in terms of their phonological and morphological correspondences within inheritance lexicons. The approach involves defining Finite State Transducers (FSTs), but in a much more subtle way than traditional speech-to-text or text-to-speech transducers. We define FSTs to provide phoneme to grapheme mappings for onsets, peaks and codas, as well as a grapheme to grapheme FST which defines spelling rules. We demonstrate the approach applied to English, Dutch and German. These three languages are interesting because they share many features of all three levels (orthography, morphology and phonology) whilst also demonstrating significant differences. This allows us to illustrate not only a range of different orthography/ phonology/ morphology relationships within languages but also the possibility of sharing information about such mappings across languages.
The BBC Voices project of 2005 resulted in a large repository of lexical, phonological and grammatical data from the UK, which included geographical references. In order to investigate the relationship between language and geography, various clustering algorithms have been applied to the BBC Voices data. Results show a clear spatial relationship, with well-defined, contiguous regions of UK language being identified. In order to prove the clustering methodology, Bayesian models have been generated for each region, and these have been tested using a set of non-standard expressions contributed by a small number of participants. Results of this second stage indicate that the models are, in most cases, able to identify the geographical region of each test participant based on the linguistic items they use.
This article was originally prepared to support the SCIP Competitiveness Task Force. Jan Herring was the principal author with major contribution from Jim Leonard, Guiliana Lavendel, Amego Ware, Jim Thomas, and Bob Margulies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.