Coherence evaluation of texts falls into a category of natural language processing tasks. The evaluation of texts’ coherence implies the estimation of their semantic and logical integrity; such a feature of a text can be utilized during the solving of multidisciplinary tasks (SEO analysis, medicine area, detection of fake texts, etc.). In this paper, different state-of-the-art coherence evaluation methods based on machine learning models have been analyzed. The investigation of the effectiveness of different methods for the coherence estimation of Polish texts has been performed. The impact of text’s features on the output coherence value has been analyzed using different approaches of a semantic similarity graph. Two neural networks based on LSTM layers and a pre-trained BERT model correspondingly have been designed and trained for the coherence estimation of input texts. The results obtained may indicate that both lexical and semantic components should be taken into account during the coherence evaluation of Polish documents; moreover, it is advisable to analyze corresponding documents in a sentence-by-sentence manner taking into account word order. According to the retrieved accuracy of the proposed neural networks, it can be concluded that suggested models may be used in order to solve typical coherence estimation tasks for a Polish corpus.
Abstract-Innovative algorithm for forming graph minimum convex hulls using the GPU is proposed. High speed and linear complexity of this method are achieved by distribution of the graph's vertices into separate units and their filtering. The key factor for improving the performance of innovative algorithm is the massively-parallel implementation of local hulls formation using video accelerators. A computational process is controlled by means of auxiliary matrices. A number of experimental studies of the algorithm have been carried out, and its suitability for application in the hull processing for large-scale problems has been demonstrated. The speed of the new method is 10 -20 times higher compared to using functions of the professional mathematical package Wolfram Mathematica.
The detection of coreferent pairs within a text is one of the basic tasks in the area of natural language processing (NLP). The state‑ of‑ the‑ art methods of coreference resolution are based on machine learning algorithms. The key idea of the methods is to detect certain regularities between the semantic or grammatical features of text entities. In the paper, the comparative analysis of current methods of coreference resolution in English and Ukrainian texts has been performed. The key disadvantage of many methods consists in the interpretation of coreference resolution as a classification problem. The result of coreferent pairs detection is the set of groups in which elements refer to a common entity. Therefore it is advisable to consider the coreference resolution as a clusterization task. The method of coreference resolution using the set of filtering sieves and a convolutional neural network has been suggested. The set of filtering sieves to find candidates for coreferent pairs formation has been implemented. The training process of a multichannel convolutional neural network on a marked Ukrainian corpus has been performed. The usage of a multichannel structure allows analyzing of the different components of text units: semantic, lexical, and grammatical features of words and sentences. Furthermore, it is possible to process input data with unfixed size (words or sentences of a text) using a convolutional layer. The output result of the method is the set of clusters. In order to form clusters, it is necessary to take into account the previous steps of the model’s workflow. Nevertheless, such an approach contradicts the traditional methodology of machine learning. Thus, the training process of the network has been performed using the SEARN algorithm that allows the solving of tasks with unfixed output structures using a classifier model. An experimental examination of the method on the corpus of Ukrainian news has been performed. In order to estimate the accuracy of the method the corresponding common metrics for clusterization tasks have been calculated. The results obtained can indicate that the suggested method can be used to find coreferent pairs within Ukrainian texts. The method can be also easily adapted and applied to other natural languages.
Запропоновано метод дослідження характеристик систем, що використовують високопродуктивні обчислення, який ґрунтується на апараті транзиційних систем (дискретної моделі обчислень). Запропоновано два варіанти обмежень синхронного добутку цих транзиційних систем, що моделюють підхід, використаний в архітектурі Nvidia CUDA. Описано транзиційні системи, що представляють два типи інструкцій, процес виконання інструкції варпом та роботу планувальника варпу. Виконано формалізацію моделі виконання GPGPU-застосування. Отримано специфікацію вищевказаного підходу та строго доведено його коректність. Специфікацію зведено до двох варіантів мереж Петрі, які дозволяють виявляти помилки проектування в автоматичному або напівавтоматичному режимі. Ключові слова: Nvidia CUDA, GPGPU, САА, мережі Петрі, транзиційна система.Предложен метод исследования характеристик систем, использующих высокопроизводительные вычисления, основанный на аппарате транзиционных систем (дискретной модели вычислений). Предложено два варианта ограничений синхронного произведения транзиционных систем, моделирующих подход, использованный в архитектуре Nvidia CUDA. Описаны транзиционные системы, представляющие два типа инструкций, процесс выполнения инструкции варпа и работу планировщика варпа. Выполнена формализация модели выполнения GPGPU-приложения. Получено спецификацию вышеуказанного подхода и строго доказана его корректность. Спецификацию сведено к двум вариантам сетей Петри, которые позволяют выявлять ошибки проектирования в автоматическом или полуавтоматическом режиме. Ключевые слова: Nvidia CUDA, GPGPU, САА, сети Петри, транзиционная система.The method of researching systems with high-performance computing support, based on the transition systems apparatus (discrete computational model), is proposed. Two variants of synchronous product limitations of transition systems that model tha Nvidia CUDA approach are proposed. transition systems that represent two types of instructions, process of the warp instruction execution, and the process of warp scheduling were described. GPGPU application execution model was formalized and its correctness was proved. Two variants of the relevant Petri net which allowed automatic or semi-automatic detection of design errors were obtained.
The estimation of text coherence is one of the most actual tasks of computer linguistics. Analysis of text coherence is widely used for writing and selection of documents. It allows clearly conveying the idea of an author to a reader. The importance of this task can be confirmed by the availability of actual works that are dedicated to solving it. Different automated methods for the estimation of text coherence are based on the methodology of machine learning. Corresponding methods are based on of formal text representation and following detection of regularities for the generation of an output result. The purpose of this work is to perform the analytic review of different automated methods for the estimation of text coherence; to justify method selection and adapt it due to the features of the Ukrainian language; to perform the experimental verification of the effectiveness of the suggested method for a Ukrainian corpus. In this paper, the comparative analysis of the methods for the estimation of coherence of English texts basing on a machine learning methodology has been performed. The expediency of application of methods that are based on trained universal models for the formalized representation of text components has been justified. The following models using neural networks with different architecture can be considered: recurrent and convolutional networks. These types of networks are widely used for text processing because they allow processing input data with an unfixed structure like sentences or words. Despite the ability of recurrent neural networks to take into account previous data (this behavior is similar to text perception by the reader), the convolutional neural network for conducting experimental research has been chosen. Such choice has been made due to the ability of convolutional neural networks to detect relations between entities regardless of the distance between them. In this paper, the principle of the method basing on the convolutional neural network and the corresponding architecture has been described. Program application for the verification of the suggested method effectiveness has been created. Formalized representation of text elements has been performed using a previously trained model for the semantic representation of words; the training process of this model has been implemented on the corpus of Ukrainian scientific abstracts. The training of the formed networks using pre-trained model has been performed. Experimental verification of method effectiveness for solving of document discrimination task and insert task has been made on the set of scientific articles. The results obtained may indicate that the method using convolutional neural networks can be used for further estimation of coherence of Ukrainian texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.