The Role of Discourse Units in Near-Extractive Summarization

Li, Junyi Jessy; Thadani, Kapil; Stent, Amanda

doi:10.18653/v1/w16-3617

Cited by 42 publications

(21 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several recent works (See et al, 2017;Paulus et al, 2018;Li et al, 2018) have used CNN-DM to build and evaluate abstractive systems. Conversely, NYT has been used to build extractive systems (Hong and Nenkova, 2014;Li et al, 2016). Given our findings, we find both of these trends to be inconsistent with dataset properties and suboptimal given other preferable datasets for these purposes: CNN-DM is one of the least abstractive datasets and there are larger and more extractive alternatives to NYT such as NWS.…”

Section: Results and Analysismentioning

confidence: 69%

Intrinsic Evaluation of Summarization Datasets

Bommasani¹,

Cardie²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

High quality data forms the bedrock for building meaningful statistical models in NLP. Consequently, data quality must be evaluated either during dataset construction or post hoc. Almost all popular summarization datasets are drawn from natural sources and do not come with inherent quality assurance guarantees. In spite of this, data quality has gone largely unquestioned for many recent summarization datasets. We perform the first large-scale evaluation of summarization datasets by introducing 5 intrinsic metrics and applying them to 10 popular datasets. We find that data usage in recent summarization research is sometimes inconsistent with the underlying properties of the datasets employed. Further, we discover that our metrics can serve the additional purpose of being inexpensive heuristics for detecting generically low quality examples.

show abstract

Section: Results and Analysismentioning

confidence: 69%

Intrinsic Evaluation of Summarization Datasets

Bommasani¹,

Cardie²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Many of these approaches are syntax-driven, though end-to-end neural models have been proposed as well (Filippova et al, 2015;Wang et al, 2017). Past non-neural work on summarization has used both syntax-based (Berg-Kirkpatrick et al, 2011;Woodsend and Lapata, 2011) and discourse-based (Carlson et al, 2001;Hirao et al, 2013;Li et al, 2016) compressions. Our approach follows in the syntax-driven vein.…”

Section: Compression In Summarizationmentioning

confidence: 99%

Neural Extractive Text Summarization with Syntactic Compression

Xu¹,

Durrett²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

130

View full text Add to dashboard Cite

Recent neural network approaches to summarization are largely either selection-based extraction or generation-based abstraction. In this work, we present a neural model for single-document summarization based on joint extraction and syntactic compression. Our model chooses sentences from the document, identifies possible compressions based on constituency parses, and scores those compressions with a neural model to produce the final summary. For learning, we construct oracle extractive-compressive summaries, then learn both of our components jointly with this supervision. Experimental results on the CNN/Daily Mail and New York Times datasets show that our model achieves strong performance (comparable to state-of-the-art systems) as evaluated by ROUGE. Moreover, our approach outperforms an off-theshelf compression module, and human and manual evaluation shows that our model's output generally remains grammatical.

show abstract

“…The Guardian provides all their content via an API called OpenPlatform 2 , launched in 2009 (Anderson, 2009). This data source has seen only tangential use in the scientific community (Li et al, 2016;Guimarães and Figueira, 2017;Murukannaiah et al, 2017) and has not been used for diachronic models before.…”

Section: Datamentioning

confidence: 99%

Diachronic Embeddings for People in the News

Hennig

Wilson

2020

Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

View full text Add to dashboard Cite

Previous English-language diachronic change models based on word embeddings have typically used single tokens to represent entities, including names of people. This leads to issues with both ambiguity (resulting in one embedding representing several distinct and unrelated people) and unlinked references (leading to several distinct embeddings which represent the same person). In this paper, we show that using named entity recognition and heuristic name linking steps before training a diachronic embedding model leads to more accurate representations of references to people, as compared to the token-only baseline. In large news corpus of articles from The Guardian, we provide examples of several types of analysis that can be performed using these new embeddings. Further, we show that real world events and context changes can be detected using our proposed model, with a focus on the examples of UK prime ministers and role changes in the football domain.

show abstract

The Role of Discourse Units in Near-Extractive Summarization

Cited by 42 publications

References 25 publications

Intrinsic Evaluation of Summarization Datasets

Intrinsic Evaluation of Summarization Datasets

Neural Extractive Text Summarization with Syntactic Compression

Diachronic Embeddings for People in the News

Contact Info

Product

Resources

About