Coherence evaluation of texts falls into a category of natural language processing tasks. The evaluation of texts’ coherence implies the estimation of their semantic and logical integrity; such a feature of a text can be utilized during the solving of multidisciplinary tasks (SEO analysis, medicine area, detection of fake texts, etc.). In this paper, different state-of-the-art coherence evaluation methods based on machine learning models have been analyzed. The investigation of the effectiveness of different methods for the coherence estimation of Polish texts has been performed. The impact of text’s features on the output coherence value has been analyzed using different approaches of a semantic similarity graph. Two neural networks based on LSTM layers and a pre-trained BERT model correspondingly have been designed and trained for the coherence estimation of input texts. The results obtained may indicate that both lexical and semantic components should be taken into account during the coherence evaluation of Polish documents; moreover, it is advisable to analyze corresponding documents in a sentence-by-sentence manner taking into account word order. According to the retrieved accuracy of the proposed neural networks, it can be concluded that suggested models may be used in order to solve typical coherence estimation tasks for a Polish corpus.
The detection of coreferent pairs within a text is one of the basic tasks in the area of natural language processing (NLP). The state‑ of‑ the‑ art methods of coreference resolution are based on machine learning algorithms. The key idea of the methods is to detect certain regularities between the semantic or grammatical features of text entities. In the paper, the comparative analysis of current methods of coreference resolution in English and Ukrainian texts has been performed. The key disadvantage of many methods consists in the interpretation of coreference resolution as a classification problem. The result of coreferent pairs detection is the set of groups in which elements refer to a common entity. Therefore it is advisable to consider the coreference resolution as a clusterization task. The method of coreference resolution using the set of filtering sieves and a convolutional neural network has been suggested. The set of filtering sieves to find candidates for coreferent pairs formation has been implemented. The training process of a multichannel convolutional neural network on a marked Ukrainian corpus has been performed. The usage of a multichannel structure allows analyzing of the different components of text units: semantic, lexical, and grammatical features of words and sentences. Furthermore, it is possible to process input data with unfixed size (words or sentences of a text) using a convolutional layer. The output result of the method is the set of clusters. In order to form clusters, it is necessary to take into account the previous steps of the model’s workflow. Nevertheless, such an approach contradicts the traditional methodology of machine learning. Thus, the training process of the network has been performed using the SEARN algorithm that allows the solving of tasks with unfixed output structures using a classifier model. An experimental examination of the method on the corpus of Ukrainian news has been performed. In order to estimate the accuracy of the method the corresponding common metrics for clusterization tasks have been calculated. The results obtained can indicate that the suggested method can be used to find coreferent pairs within Ukrainian texts. The method can be also easily adapted and applied to other natural languages.
The estimation of text coherence is one of the most actual tasks of computer linguistics. Analysis of text coherence is widely used for writing and selection of documents. It allows clearly conveying the idea of an author to a reader. The importance of this task can be confirmed by the availability of actual works that are dedicated to solving it. Different automated methods for the estimation of text coherence are based on the methodology of machine learning. Corresponding methods are based on of formal text representation and following detection of regularities for the generation of an output result. The purpose of this work is to perform the analytic review of different automated methods for the estimation of text coherence; to justify method selection and adapt it due to the features of the Ukrainian language; to perform the experimental verification of the effectiveness of the suggested method for a Ukrainian corpus. In this paper, the comparative analysis of the methods for the estimation of coherence of English texts basing on a machine learning methodology has been performed. The expediency of application of methods that are based on trained universal models for the formalized representation of text components has been justified. The following models using neural networks with different architecture can be considered: recurrent and convolutional networks. These types of networks are widely used for text processing because they allow processing input data with an unfixed structure like sentences or words. Despite the ability of recurrent neural networks to take into account previous data (this behavior is similar to text perception by the reader), the convolutional neural network for conducting experimental research has been chosen. Such choice has been made due to the ability of convolutional neural networks to detect relations between entities regardless of the distance between them. In this paper, the principle of the method basing on the convolutional neural network and the corresponding architecture has been described. Program application for the verification of the suggested method effectiveness has been created. Formalized representation of text elements has been performed using a previously trained model for the semantic representation of words; the training process of this model has been implemented on the corpus of Ukrainian scientific abstracts. The training of the formed networks using pre-trained model has been performed. Experimental verification of method effectiveness for solving of document discrimination task and insert task has been made on the set of scientific articles. The results obtained may indicate that the method using convolutional neural networks can be used for further estimation of coherence of Ukrainian texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.