Top-Down RST Parsing Utilizing Granularity Levels in Documents

Kobayashi, Naoki; Hirao, Tsutomu; Kamigaito, Hidetaka; Okumura, Manabu; Nagata, Michio

doi:10.1609/aaai.v34i05.6321

Cited by 39 publications

(71 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [16], the segmenter and the sentence-level parser are trained jointly as parts of the unified encoder-decoder architecture, achieving superior results in the parsing performance, as well as in the parsing speed compared to previous end-to-end sentence-level bottom-up discourse parsers, namely SPADE [26] and DCRF [13]. Kobayashi et al [15] has recently proposed a top-down method that takes into account granularity levels of spans, namely document, paragraph, and sentence. The authors show that the granularity levels are important features for discourse parsing and achieve 60% micro F score on the RST-DT corpus.…”

Section: Related Workmentioning

confidence: 99%

RST Discourse Parser for Russian: An Experimental Study of Deep Learning Models

Chistova¹,

Shelmanov

Pisarevskaya³

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This work presents the first fully-fledged discourse parser for Russian based on the Rhetorical Structure Theory of Mann and Thompson (1988). For the segmentation, discourse tree construction, and discourse relation classification we employ deep learning models. With the help of multiple word embedding techniques, the new state of the art for discourse segmentation of Russian texts is achieved. We found that the neural classifiers using contextual word representations outperform previously proposed feature-based models for discourse relation classification. By ensembling both methods, we are able to further improve the performance of the discourse relation classification achieving the new state of the art for Russian.

show abstract

Section: Related Workmentioning

confidence: 99%

RST Discourse Parser for Russian: An Experimental Study of Deep Learning Models

Chistova¹,

Shelmanov

Pisarevskaya³

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…In English, RST-DT (Carlson et al, 2003) is one of the popular discourse corpora (Subba and Di Eugenio, 2009;Zeldes, 2017;Kolhatkar and Taboada, 2017), which annotates the discourse structure, nuclearity, and relationship of a document. Most previous studies have focused on complete discourse parsing and can be mainly categorized into the shift-reduce algorithm (Ji and Eisenstein, 2014;Wang et al, 2017;Yu et al, 2018;Jia et al, 2018), the probabilistic CKY-like algorithm (Joty et al, 2013;Li et al, 2014a;Li et al, 2016), and the bottom-up algorithm (Hernault et al, 2010;Feng and Hirst, 2014;Kobayashi et al, 2019;Kobayashi et al, 2020). Recently, the generative algorithm (Mabona et al, 2019) and the top-down algorithm (Liu et al, 2019;Lin et al, 2019) tried out discourse parsing.…”

Section: Related Workmentioning

confidence: 99%

“…In RST-style discourse parsing, the parser first identifies whether there is a rhetorical relationship between discourse units to construct a naked tree and then recognizes the nuclearity and relation labels for each relationship, as shown in Figure 1. According to the granularity of the leaf nodes, the discourse tree is divided into three levels: clause level, sentence level and paragraph level (Kobayashi et al, 2020). This paper focuses on constructing paragraph-level Chinese discourse trees where the leaf node is a paragraph.…”

Section: Introductionmentioning

confidence: 99%

Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading

Jiang

Chu

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Discourse structure tree construction is the fundamental task of discourse parsing and most previous work focused on English. Due to the cultural and linguistic differences, existing successful methods on English discourse parsing cannot be transformed into Chinese directly, especially in paragraph level suffering from longer discourse units and fewer explicit connectives. To alleviate the above issues, we propose two reading modes, i.e., the global backward reading and the local reverse reading, to construct Chinese paragraph level discourse trees. The former processes discourse units from the end to the beginning in a document to utilize the left-branching bias of discourse structure in Chinese, while the latter reverses the position of paragraphs in a discourse unit to enhance the differentiation of coherence between adjacent discourse units. The experimental results on Chinese MCDTB demonstrate that our model outperforms all strong baselines.

show abstract

“…In most cases, RST parsers have been developed on the basis of supervised learning algorithms (Wang et al, 2017b;Yu et al, 2018;Kobayashi et al, 2020;Lin et al, 2019;Zhang et al, 2020), which require a high-quality annotated corpus of sufficient size. Generally, they train the following three components of the RST parsing: (1) structure prediction by splitting a text span consisting of contiguous EDUs into two smaller ones or merging two adjacent spans into a larger one, (2) nuclearity status prediction for two adjacent spans by solving a 3-class classification problem, and (3) relation label prediction for two adjacent spans by solving an 18-class classification problem (see Section 3.3 for details).…”

Section: Introductionmentioning

confidence: 99%

Improving Neural RST Parsing Model with Silver Agreement Subtrees

Kobayashi¹,

Hirao²,

Kamigaito³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Most of the previous Rhetorical Structure Theory (RST) parsing methods are based on supervised learning such as neural networks, that require an annotated corpus of sufficient size and quality. However, the RST Discourse Treebank (RST-DT), the benchmark corpus for RST parsing in English, is small due to the costly annotation of RST trees. The lack of large annotated training data causes poor performance especially in relation labeling. Therefore, we propose a method for improving neural RST parsing models by exploiting silver data, i.e., automatically annotated data. We create large-scale silver data from an unlabeled corpus by using a state-of-the-art RST parser. To obtain high-quality silver data, we extract agreement subtrees from RST trees for documents built using the RST parsers. We then pre-train a neural RST parser with the obtained silver data and fine-tune it on the RST-DT. Experimental results show that our method achieved the best micro-F1 scores for Nuclearity and Relation at 75.0 and 63.2, respectively. Furthermore, we obtained a remarkable gain in the Relation score, 3.0 points, against the previous state-of-the-art parser.

show abstract

Top-Down RST Parsing Utilizing Granularity Levels in Documents

Cited by 39 publications

References 23 publications

RST Discourse Parser for Russian: An Experimental Study of Deep Learning Models

RST Discourse Parser for Russian: An Experimental Study of Deep Learning Models

Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading

Improving Neural RST Parsing Model with Silver Agreement Subtrees

Contact Info

Product

Resources

About