Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1258
|View full text |Cite
|
Sign up to set email alerts
|

Does syntax help discourse segmentation? Not so much

Abstract: Discourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons: (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies, or left out completely, as well as how state-of-the-art segmenters fare in the absence of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 22 publications
1
8
0
Order By: Relevance
“…Thus, bad parse trees contribute only partially to this error, and we suspect better trees may not provide much bene- fit. This finding is consistent with the little help dependency trees provided for cross-lingual discourse segmentation in Braud et al (2017b). We further note the tokenizer for TWO-PASS makes no errors on the medical data, but conversely has a higher proportion of punctuation errors.…”
Section: Errors Between Segmenterssupporting
confidence: 86%
“…Thus, bad parse trees contribute only partially to this error, and we suspect better trees may not provide much bene- fit. This finding is consistent with the little help dependency trees provided for cross-lingual discourse segmentation in Braud et al (2017b). We further note the tokenizer for TWO-PASS makes no errors on the medical data, but conversely has a higher proportion of punctuation errors.…”
Section: Errors Between Segmenterssupporting
confidence: 86%
“…While earlier studies investigated the usefulness of various sources of information, notably syntactic information using chunkers (Sporleder and Lapata, 2005) or full trees (Fisher and Roark, 2007;Braud et al, 2017b), recent studies mostly rely on word embeddings as input of neural network sequential architectures (Wang et al, 2018;.…”
Section: Related Workmentioning
confidence: 99%
“…The first results at the document level were presented in (Braud et al, 2017a), where the authors investigated cross-lingual and cross-domain training, and in (Braud et al, 2017b), a study focused on the use of syntactic information. In these studies, the best performing system for the English RST-DT obtained 89.5% in F1, showing that the task is more difficult when the sentence boundaries are not given.…”
Section: Related Workmentioning
confidence: 99%
“…Hernault et al (2010) used an SVM model with features corresponding to token and POS trigrams at and preceding a potential segmentation point, as well as features encoding the lexical head of each token's parent phrase in a phrase structure syn-tax tree and the same features for the sibling node on the right. More recently, Braud et al (2017b) used a bi-LSTM-CRF sequence labeling approach on dependency parses, with words, POS tags, dependency relations and the same features for each word's parent and grand-parent tokens, as well as the direction of attachment (left or right), achieving F-scores of .89 on segmenting RST-DT with parser-predicted syntax, and scores in the 80s, near or above previous SOA results, for a number of other corpora and languages.…”
Section: Previous Workmentioning
confidence: 99%
“…However, as recent work (Braud et al, 2017b) has shown, performance on smaller or less homogeneous corpora than RST-DT, and especially in the absence of gold syntax trees (which are realistically unavailable at test time for practical applications), hovers around the mid 80s, making it problematic for full discourse parsing in practice. This is more critical for languages and domains in which relatively small datasets are available, making the application of generic neural models less promising.…”
Section: Introductionmentioning
confidence: 99%