In the field of NLP (Natural Language Processing) research, the use of a neural network has become important. The neural network is widely used in the semantic analysis of texts in different languages. In connection with the actualization of the processing of big data in the Kazakh language, a neural network was built for deep learning. In this study, the object is the learning process of a deep neural network, which evaluates the algorithm for constructing an LDA model. One of the most problematic places is determining the correct arguments, which, when compiling the model, will give an estimate of the algorithm's performance. During the research, the compile () method from the Keras modular library was used, the main arguments of which are the loss function, optimizers, and metrics. The neural network is implemented in the Python programming language. The main arguments of the neural network deep learning compiler for evaluating the LDA model is the selection of arguments to obtain the correct evaluation of the algorithm of the constructed model using deep learning of the neural network. A corpus of text in the Kazakh language with no more than 8000 words is presented as learning data. Using the above methods, an experiment was carried out on the selection of arguments for the model compiler when learning a text corpus in the Kazakh language. As a result, the optimizer -SGD, the loss function -binary_crossentropy, and the estimation metric -'cosine_proximity' were chosen as the optimal arguments, which, as a result of learning, showed a tendency to 0 loss (errors) = 0.1984, and cosine_proximity (learning accuracy) = 0.2239, which is considered acceptable learning measures. The results indicate the correct choice of compilation arguments. These arguments can be applied when conducting deep learning of a neural network, where the sample data is a pair of «topic and keywords».
Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.