In this article, the method of text classification using a convolutional neural network is presented. The problem of text classification is formulated, the architecture and the parameters of a convolutional neural network for solving the problem are described, the steps of the solution and the results of classification are given. The convolutional network which was used was trained to classify the texts of the news messages of Internet information portals. The semantic preprocessing of the text and the translation of words into attribute vectors are generated using the open word2vec model. The analysis of the dependence of the classification quality on the parameters of the neural network is presented. The using of the network allowed obtaining a classification accuracy of about 84%. In the estimation of the accuracy of the classification, the texts were checked to belong to the group of semantically similar classes. This approach allowed analyzing news messages in cases where the text themes and the number of classification classes in the training and control samples do not equal.
This paper describes application of the basic methods of semantic analysis of text data -Porter stemming, frequency semantic analysis, latent semantic analysis and syntactic semantic analysis using an automated system. The system allows analyzing the text using these methods. The characteristics and features of the methods' implementation as well as the obtained results of their applying to texts of small complexity are considered. The research allows to reveal features of usage of the methods according to the text analysis purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.