In the context of the ongoing AGGREGATION project concerned with inferring grammars from interlinear glossed text, we explore the integration of morphological patterns extracted from IGT data with inferred syntactic properties in the context of creating implemented linguistic grammars. We present a case study of Chintang, in which we put emphasis on evaluating the accuracy of these predictions by using them to generate a grammar and parse running text. Our coverage over the corpus is low because the lexicon produced by our system only includes intransitive and transitive verbs and nouns, but it outperforms an expert-built, oracle grammar of similar scope.
In this paper I present a k-means clustering approach to inferring morphological position classes (morphotactics) from Interlinear Glossed Text (IGT), data collections available for some endangered and low-resource languages. While the experiment is not restricted to low-resource languages, they are meant to be the targeted domain. Specifically my approach is meant to be for field linguists who do not necessarily know how many position classes there are in the language they work with and what the position classes are, but have the expertise to evaluate different hypotheses. It builds on an existing approach (Wax, 2014), but replaces the core heuristic with a clustering algorithm. The results mainly illustrate two points. First, they are largely negative, which shows that the baseline algorithm (summarized in the paper) uses a very predictive feature to determine whether affixes belong to the same position class, namely edge overlap in the affix graph. At the same time, unlike the baseline method that relies entirely on a single feature, kmeans clustering can account for different features and helps discover more morphological phenomena, e.g. circumfixation. I conclude that unsupervised learning algorithms such as k-means clustering can in principle be used for morphotactics inference, though the algorithm should probably weigh certain features more than others. Most importantly, I conclude that clustering is a promising approach for diverse morphotactics and as such it can facilitate linguistic analysis of field languages.
We present a system that automatically groups verb stems into inflection classes, performing a case study of Abui verbs. Starting from a relatively small number of fully glossed Abui sentences, we train a morphological precision grammar and use it to automatically analyze and gloss words from the unglossed portion of our corpus. Then we group stems into classes based on their cooccurrence patterns with several prefix series of interest. We compare our results to a curated collection of elicited examples and illustrate how our approach can be useful for field linguists as it can help them refine their analysis by accounting for more patterns in the data.
We present a web-based system that facilitates the exploration of complex morphological patterns found in morphologically rich languages. The need for better understanding of such patterns is urgent for linguistics and important for cross-linguistically applicable natural language processing. We give an overview of the system architecture and describe a sample case study on Abui [abz], a Trans-New Guinea language spoken in Indonesia.
We present an analysis of multiple question fronting in a restricted variant of the HPSG formalism (DELPH-IN) where unification is the only natively defined operation. Analysing multiple fronting in this formalism is challenging, because it requires carefully handling list appends, something that HPSG analyses of question fronting heavily rely on. Our analysis uses the append list type to address this challenge. We focus the testing of our analysis on Russian, although we also integrate it into the Grammar Matrix customization system where it serves as a basis for cross-linguistic modeling. In this context, we discuss the relationship of our analysis to lexical threading and conclude that, while lexical threading has its advantages, modeling multiple extraction cross-linguistically is easier without the lexical threading assumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.