Automatic emotive text analysis has demonstrated its relevance in recent years. In this paper, we address the issue of identification emotions in the text of informal internet-discourse of the Russian language. We consider text messages collected from Telegram and VK. Due to difficulty of such advanced form of sentiment analysis, this paper proposes an integrated approach to combining linguistic methods and machine learning. As a result, an automatic classifier of text messages on expressed emotions is designed. On testing, our model is estimated to provide near-human performance.
In this paper, we address the issue of identifying emotions in Russian informal text messages. For this purpose, a new large dataset of text messages from the most popular Russian messaging/social networking services (Telegram, VK) was compiled semi-automatically. Emojis contained in the text messages were used to annotate the data for emotions expressed. This paper proposes an integrated approach to text-based emotion classification combining linguistic methods and machine learning. This approach relies on morphological, lexical, and stylistic features of the text. Furthermore, the level of expressiveness was considered as well. As a result, an emotion classification model demonstrating near-human performance was designed. In this paper, we also report on the importance of different linguistic features of the text messages for the task of automatic emotive analysis. Additionally, we perform error analysis and discover ways to improve the model in the future.
The study deals with testing a specialized text corpus on the example of a number of cognitive linguistic terms with the hypernym frame. The corpus includes a subcorpus of scientific texts and a subcorpus of journalistic texts. The former is represented by 13 journals indexed in the RSCI; the latter one is represented by 10 significant Russian newspapers & magazines. The collected texts were lemmatized and tokenized, as well as automatically marked up using the Universal Dependencies standard. The corpus is used for creating a learner’s cognitive linguistic term dictionary. This lexicographic source includes 60 major terms in the university disciplines within cognitive sciences. The novelty of the approach is due to the thesaurus-encyclopedic type of dictionary which allows scholars to describe the word both as a term (a minimal component of scientific knowledge) and as a unit of scientific text in its various collocations and ontological relations (synonymy, quasi-synonymy, class-subclass; polysemy, etc.). The basis for describing the term systemic relations is corpus statistics: analysis of concordances, collocations, and n-grams. The results of using a specialized text corpus are presented on the example of a terminological field with the dominant frame. With the help of concordance lists, contexts of term usage are revealed and its derivational relations are established. The semantic and grammatical relationships of the term are characterized through n-grams analysis. Synonyms, hyponyms, hyperonyms of the term are described based on the study of collocations. The Dashinimaeva-Wang hypothesis about the links between the terms ‘concept’, ‘frame’, ‘gestalt’, and ‘image’ was also tested. Briefly characterized were the semantic transformations of the scientific term in the media discourse. Besides, systemic relations of the terms identified on the basis of corpus statistics were confirmed by the method of interpretative analysis of the contexts obtained from concordance lists. The data of automatically selected associative measures also largely agree with the results of the associative experiment carried out in the framework of the study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.