This work presents the first fully-fledged discourse parser for Russian based on the Rhetorical Structure Theory of Mann and Thompson (1988). For the segmentation, discourse tree construction, and discourse relation classification we employ deep learning models. With the help of multiple word embedding techniques, the new state of the art for discourse segmentation of Russian texts is achieved. We found that the neural classifiers using contextual word representations outperform previously proposed feature-based models for discourse relation classification. By ensembling both methods, we are able to further improve the performance of the discourse relation classification achieving the new state of the art for Russian.
Sentence packaging is an important task in natural language text generation which could be treated as a particular kind of a community detection problem. We propose an approach based on genetic algorithm and predictive machine learning models to advance it. The approach allows handling large ontological and semantic structures in a form of a graph to produce well-formed sentences. The results of experiments showed that the genetic algorithm optimizing the modularity measure gives comparable results to ones achieved by a traditional community detection algorithm and outperforms it on a collection of relatively short texts. The design of an approach allows for further introducing linguistic characteristics into a fitness function that gives it a high potential to increase the quality of detected packages while taking into account the specificity of the domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.