Even though suicide is one of the top three causes of young people's deaths, no reliable methods of identifying suicidal behavior have been developed. One of the promising directions of research is quantitative analysis of speech. It is nowadays common to process texts by suicidal individuals (mostly suicidal notes or literary texts by famous people, e.g., poets, writes, etc.) and texts by individuals from a control group using software (mostly LIWC) and to design models for classifying texts as those by suicidal individuals or not. This kind of analysis has been mainly performed for English texts that generally have a number of restrictions due to their linguistic nature. The authors are the first to attempt to design a mathematical model to classify texts as those by suicidal or nonsuicidal individuals using numerical values of linguistic parameters as features. Texts (blogs by young people who committed suicides, similar in both genre and topic, to those by individuals of an age-corresponding control group) were processed using the Russian version of LIWC with users' dictionaries. Unlike current studies, in designing the model we mostly made use of features that are not significantly dependent on the content. This is because not all individuals who committed suicides are known to deal with the topic in their texts. The resulting model was shown to be 71.5% accurate, which is comparable with the state-of-the-art for English texts.
Psychology studies show that people detect deception no more accurately than by chance, and it is therefore important to develop tools to enable the detection of deception. The problem of deception detection has been studied for a significant amount of time, however in the last 10-15 years we have seen methods of computational linguistics being employed with greater frequency. Texts are processed using different NLP tools and then classified as deceptive/truthful using modern machine learning methods. While most of this research has been performed for the English language, Slavic languages have never been the focus of detection deception studies. This paper deals with deception detection in Russian narratives related to the theme "How I Spent Yesterday". It employs a specially designed corpus of truthful and deceptive texts on the same topic from each respondent, such that N = 113. The texts were processed using Linguistic Inquiry and Word Count software that is used in most studies of text-based deception detection. The average amount of parameters, a majority of which were related to Part-of-Speech, lexical-semantic group, and other frequencies. Using standard statistical analysis, statistically significant differences between false and truthful Russian texts was uncovered. On the basis of the chosen parameters our classifier reached an accuracy of 68.3%. The accuracy of the model was found to depend on the author's gender.
The differences in the frequencies of some parts of speech (POS), particularly function words, and lexical diversity in male and female speech have been pointed out in a number of papers. The classifiers using exclusively context-independent parameters have proved to be highly effective. However, there are still issues that have to be addressed as a lot of studies are performed for English and the genre and topic of texts is sometimes neglected. The aim of this paper is to investigate the association between contextindependent parameters of Russian written texts and the gender of their authors and to design predictive regression models. A number of correlations were found. The obtained data is in good agreement with the results obtained for other languages. The model based on 5 parameters with the highest correlation coefficients was designed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.