Some results obtained from quantitative analysis of the texts produced by six Russian schizophrenic patients was analyzed. The analysis shows that there exists some statistical parameters which reflect two major types of verbal-mental disorders. In the first case, an obsession reorders the patient's verbal-mental activity. Consequently, the text is filled mainly with words and word combinations related to the obsessional topic. The variety of lexical units employed here is restricted, and the are many repetitions. This naturally leads to rapid saturation. This is reflected in the parabolic form of Zipf's curve. Disorders of the second type are characterized by multiple topics and the absence of a consistent subject, the lexicon is here varied and chaotic. Thus such a text represents unsaturated sets having Zipf's parameter ( 1 and small values of Herdan's parameter U.
We present a novel quantitative approach for classification of authors' stylistics and gender differences based on extraction of word collocation. The proposed algorithm attenuates previously described issues of text processing using the vector models. We demonstrate the approach by analyzing a corpus of Russian prose. We discuss different approaches for classification and identification of the author's style implemented by currently-available software solutions and libraries of morphological analysis, methods of parameterization, indexing of texts, artificial intelligence algorithms and knowledge extraction. Our results demonstrate the efficiency and relative advantage of regression decision tree methods in identifying informative frequency indexes in a way that lends itself to their logical interpretation. We develop a toolkit for conducting comparative experiments to assess the effectiveness of classification of natural language text data, using vector, set-theoretic and the author's set-theoretic with collocation extraction models of text representation. Comparing the ability of different methods to identify the style and gender differences of authors of fiction works, we find that the proposed approach incorporating collocation information alleviates some of the previously identified deficiencies and yields overall improvements in the classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.