Does syntax help discourse segmentation? Not so much

Braud, Chloé; Lacroix, Ophélie; Søgaard, Anders

doi:10.18653/v1/d17-1258

Cited by 11 publications

(9 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, bad parse trees contribute only partially to this error, and we suspect better trees may not provide much bene- fit. This finding is consistent with the little help dependency trees provided for cross-lingual discourse segmentation in Braud et al (2017b). We further note the tokenizer for TWO-PASS makes no errors on the medical data, but conversely has a higher proportion of punctuation errors.…”

Section: Errors Between Segmenterssupporting

confidence: 86%

From News to Medical: Cross-domain Discourse Segmentation

Ferracane¹,

Page²,

Li³

et al. 2019

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

View full text Add to dashboard Cite

The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain. 1

show abstract

Section: Errors Between Segmenterssupporting

confidence: 86%

From News to Medical: Cross-domain Discourse Segmentation

Ferracane¹,

Page²,

Li³

et al. 2019

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

View full text Add to dashboard Cite

show abstract

“…While earlier studies investigated the usefulness of various sources of information, notably syntactic information using chunkers (Sporleder and Lapata, 2005) or full trees (Fisher and Roark, 2007;Braud et al, 2017b), recent studies mostly rely on word embeddings as input of neural network sequential architectures (Wang et al, 2018;.…”

Section: Related Workmentioning

confidence: 99%

“…The first results at the document level were presented in (Braud et al, 2017a), where the authors investigated cross-lingual and cross-domain training, and in (Braud et al, 2017b), a study focused on the use of syntactic information. In these studies, the best performing system for the English RST-DT obtained 89.5% in F1, showing that the task is more difficult when the sentence boundaries are not given.…”

Section: Related Workmentioning

confidence: 99%

Untitled

Muller¹,

Braud²,

Morey³

2019

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Self Cite

View full text Add to dashboard Cite

Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.

show abstract

“…Hernault et al (2010) used an SVM model with features corresponding to token and POS trigrams at and preceding a potential segmentation point, as well as features encoding the lexical head of each token's parent phrase in a phrase structure syn-tax tree and the same features for the sibling node on the right. More recently, Braud et al (2017b) used a bi-LSTM-CRF sequence labeling approach on dependency parses, with words, POS tags, dependency relations and the same features for each word's parent and grand-parent tokens, as well as the direction of attachment (left or right), achieving F-scores of .89 on segmenting RST-DT with parser-predicted syntax, and scores in the 80s, near or above previous SOA results, for a number of other corpora and languages.…”

Section: Previous Workmentioning

confidence: 99%

“…However, as recent work (Braud et al, 2017b) has shown, performance on smaller or less homogeneous corpora than RST-DT, and especially in the absence of gold syntax trees (which are realistically unavailable at test time for practical applications), hovers around the mid 80s, making it problematic for full discourse parsing in practice. This is more critical for languages and domains in which relatively small datasets are available, making the application of generic neural models less promising.…”

Section: Introductionmentioning

confidence: 99%

Untitled

Zhu

Liu

et al. 2019

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

View full text Add to dashboard Cite

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.

show abstract

Does syntax help discourse segmentation? Not so much

Cited by 11 publications

References 22 publications

From News to Medical: Cross-domain Discourse Segmentation

From News to Medical: Cross-domain Discourse Segmentation

Untitled

Untitled

Contact Info

Product

Resources

About