Data-driven Approaches to Author’s Profiling Identification for Russian Texts on Base of Complex Machine Learning Models in Combinations with Siamese Networks
Abstract:Abstract. In this work data-driven approaches to author's profiling identification for Russian texts are investigated on base of a united data corpus. This corpus has been specially collected by crowdsourcing, and currently contains texts from 1161 men and 2043 women. The adaptation of complicated models, based on convolutional neural networks, gradient boosting methods, LSTM, Siamese networks along with different input data and features (morphological data, vector of character n-grams frequencies, Linguistic … Show more
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.