In this article we present an extended version of PolEmo -a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtained a high value of Positive Specific Agreement, which is 0.91 for texts and 0.88 for sentences. PolEmo 2.0 is publicly available under a Creative Commons copyright license. We explored recent deep learning approaches for the recognition of sentiment, such as Bidirectional Long Short-Term Memory (BiL-STM) and Bidirectional Encoder Representations from Transformers (BERT).
In this article, we present a novel multidomain dataset of Polish text reviews, annotated with sentiment on different levels: sentences and the whole documents. The annotation was made by linguists in a 2+1 scheme (with inter-annotator agreement analysis). We present a preliminary approach to the classification of labelled data using logistic regression, bidirectional long short-term memory recurrent neural networks (BiLSTM) and bidirectional encoder representations from transformers (BERT).
The goal of the paper is to show how external situations (cultural, institutional, social) of the suicide note genre and its component speech acts (especially: Last Will Instructions, Saying 'I love you' and Thanking) impact the choice of the strategies of apologizing. The results of the research conducted using Polish Corpus of Suicide Notes were compared to the description of apologizing as found in the Polish part of project Cross-Cultural Speech Act Realization Patterns. Natural communicative situation reflected in the analyzed texts reveal a complex structure of the apologizing speech act with its overlapping strategies. The most noticeabledifferences between everyday apologizing and apologizing in the suicide note emerge in the strategy of Taking of Responsibility whose form depends not only on the type of the offense but also on the recipients. In suicide notes,the author knows the rules of apologizing but he adjusts them to his situation and he subordinatesthe text to his goals. The most important person in the sender-recipient relationship appears to be the author himself and the injured party remains in the shadow. In the sender's mind, both the acceptance of the apologyand the direct contact arenot necessary.
In this paper, we study language used by suicidal users on Reddit social media platform. To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme. Next, we perform a multifaceted analysis of the dataset, including: (1) the analysis of user activity before and after posting a suicidal message, and (2) a pragmalinguistic study on the vocabulary used by suicidal users. In the second part of the analysis, we apply LIWC, a dictionary-based toolset widely used in psychology and linguistic research, which provides a wide range of linguistic category annotations on text. However, since raw LIWC scores are not sufficiently reliable, or informative, we propose a procedure to decrease the possibility of unreliable and misleading LIWC scores leading to misleading conclusions by analyzing not each category separately, but in pairs with other categories. The analysis of the results supported the validity of the proposed approach by revealing a number of valuable information on the vocabulary used by suicidal users and helped to pin-point false predictors. For example, we were able to specify that death-related words, typically associated with suicidal posts in the majority of the literature, become false predictors, when they co-occur with apostrophes, even in high-risk subreddits. On the other hand, the category-pair based disambiguation helped to specify that death becomes a predictor only when co-occurring with future-focused language, informal language, discrepancy, or 1st person pronouns. The promising applicability of the approach was additionally analyzed for its limitations, where we found out that although LIWC is a useful and easily applicable tool, the lack of any contextual processing makes it unsuitable for application in psychological and linguistic studies. We conclude that disadvantages of LIWC can be easily overcome by creating a number of high-performance AI-based classifiers trained for annotation of similar categories as LIWC, which we plan to pursue in future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.