We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs that are both orthographically and semantically similar, where orthographic similarity is measured in terms of minimum edit distance, and semantic similarity is measured in terms of mutual information. The procedure does not rely on a morpheme concatenation model, nor on distributional properties of word substrings (such as affix frequency). Experiments with German and English input give encouraging results, both in terms of precision (proportion of good pairs found at various cutoff points of the ranked list), and in terms of a qualitative analysis of the types of morphological patterns discovered by the algorithm.
Communication and information exchange is a vital factor in human society. Communication disorders severely influence the quality of life. Whereas experienced typists will produce some 300 keystrokes per minute, persons with motor impairments achieve only much lower rates. Predictive typing systems for English speaking areas have proven useful and efficient, but for all other European languages there exist no predictive typing programs powerful enough to substantially improve the communication rate and the IT access for disabled persons. FASTY aims at offering a communication support system significantly increasing typing speed, adaptable to users with different language and strongly varying needs. In this way the large group of non-English-speaking disabled citizens will be supported in living a more independent and self determined life.
In word prediction systems for augmentative and alternative communication (AAC), productive wordformation processes such as compounding pose a serious problem. We present a model that predicts German nominal compounds by splitting them into their modifier and head components, instead of trying to predict them as a whole. The model is improved further by the use of class-based modifierhead bigrams constructed using semantic classes automatically extracted from a corpus. The evaluation shows that the split compound model with class bigrams leads to an improvement in keystroke savings of more than 15% over a no split compound baseline model. We also present preliminary results obtained with a word prediction model integrating compound and simple word prediction.
In this paper, we report about some preliminary experiments in which we tried to improve the performance of a state-of-the-art Predictive Typing system for the German language by adding a collocation-based prediction component. This component tries to exploit the fact that texts have a topic and are semantically coherent. Thus, the appearance in a text of a certain word can be a cue that other, semantically related words are likely to appear soon. The collocation-based module exploits this kind of topical/semantic relatedness by relying on statistics about the co-occurrence of words within a large window of text in the training corpus. Our current experimental results indicate that using the collocationbased prediction module has a small but consistent positive effect on the performance of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.