The first task in Tibetan Natural Language Processing is word segmentation. We present our lightweight segmentation tool that is based on lexical resources. It can be executed within InDesign and the user can update it with the manual corrections of its output. We then propose a semi-automated workflow aiming at syntactic analysis that uses utterance simplification and intonation cues to get precise information about the syntactic structure of the Tibetan language. Native speakers, even if they are non-specialists, are thus able to provide us with precise information about the structure of utterances. This will allow the scientific community to obtain resources enabling the study of Tibetan syntax. Moreover, the extra task we have included allows for the easy generation of educational materials that the informants can benefit from.
This document presents our research on the the correct formation of a Classical Tibetan syllable. It was triggered by attempts at defining the boundaries of well-formed syllables in Classical Tibetan for spell checking purposes. Formalizing the formation of the syllable led us to inspect the small differences among grammar books, both in Western and Tibetan language. We then checked these differences against the Tibetan dictionaries we consider reliable, and also against the Kangyur. Our inquiry finally led us to study the way to decompose a syllable, discussing the ambiguous cases, as well as the formation of the Dzongkha syllable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.