Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2016
DOI: 10.18653/v1/w16-3617
|View full text |Cite
|
Sign up to set email alerts
|

The Role of Discourse Units in Near-Extractive Summarization

Abstract: Although human-written summaries of documents tend to involve significant edits to the source text, most automated summarizers are extractive and select sentences verbatim. In this work we examine how elementary discourse units (EDUs) from Rhetorical Structure Theory can be used to extend extractive summarizers to produce a wider range of human-like summaries. Our analysis demonstrates that EDU segmentation is effective in preserving human-labeled summarization concepts within sentences and also aligns with ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(21 citation statements)
references
References 25 publications
0
21
0
Order By: Relevance
“…Several recent works (See et al, 2017;Paulus et al, 2018;Li et al, 2018) have used CNN-DM to build and evaluate abstractive systems. Conversely, NYT has been used to build extractive systems (Hong and Nenkova, 2014;Li et al, 2016). Given our findings, we find both of these trends to be inconsistent with dataset properties and suboptimal given other preferable datasets for these purposes: CNN-DM is one of the least abstractive datasets and there are larger and more extractive alternatives to NYT such as NWS.…”
Section: Results and Analysismentioning
confidence: 69%
“…Several recent works (See et al, 2017;Paulus et al, 2018;Li et al, 2018) have used CNN-DM to build and evaluate abstractive systems. Conversely, NYT has been used to build extractive systems (Hong and Nenkova, 2014;Li et al, 2016). Given our findings, we find both of these trends to be inconsistent with dataset properties and suboptimal given other preferable datasets for these purposes: CNN-DM is one of the least abstractive datasets and there are larger and more extractive alternatives to NYT such as NWS.…”
Section: Results and Analysismentioning
confidence: 69%
“…Many of these approaches are syntax-driven, though end-to-end neural models have been proposed as well (Filippova et al, 2015;Wang et al, 2017). Past non-neural work on summarization has used both syntax-based (Berg-Kirkpatrick et al, 2011;Woodsend and Lapata, 2011) and discourse-based (Carlson et al, 2001;Hirao et al, 2013;Li et al, 2016) compressions. Our approach follows in the syntax-driven vein.…”
Section: Compression In Summarizationmentioning
confidence: 99%
“…The Guardian provides all their content via an API called OpenPlatform 2 , launched in 2009 (Anderson, 2009). This data source has seen only tangential use in the scientific community (Li et al, 2016;Guimarães and Figueira, 2017;Murukannaiah et al, 2017) and has not been used for diachronic models before.…”
Section: Datamentioning
confidence: 99%