A Summarization System for Scientific Documents

Erera, Shai; Shmueli-Scheuer, Michal; Feigenblat, Guy; Nakash, Ora Peled; Boni, Odellia; Roitman, Haggai; Cohen, Doron; Weiner, Bar; Mass, Yosi; Rivlin, Or; Lev, Guy; Jerbi, Achiya; Herzig, Jonathan; Hou, Yufang; Jochim, Charles; Gleize, Martin; Bonin, Francesca; Konopnicki, David

doi:10.18653/v1/d19-3036

Cited by 43 publications

(28 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Most studies on query-based text summarization focus on the multi-document level (Dang, 2005;Baumel et al, 2016) and use extractive approaches (Feigenblat et al, 2017;Xu and Lapata, 2020). In the scientific literature domain, Erera et al (2019) apply an unsupervised extractive approach to generate a summary for each section of a paper. In contrast to previous work, we construct a challenging QSS dataset for scientific paper-slide pairs and apply an abstractive approach to generate slide contents for a given slide title.…”

Section: Related Workmentioning

confidence: 99%

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Sun

Hou

Wang

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming. There has been limited research aiming to automate the document-to-slides generation process and all face a critical challenge: no publicly available dataset for training and benchmarking. In this work, we first contribute a new dataset, SciDuet, consisting of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences (e.g., ACL). Secondly, we present D2S, a novel system that tackles the document-to-slides task with a two-step approach: 1) Use slide titles to retrieve relevant and engaging text, figures, and tables; 2) Summarize the retrieved context into bullet points with long-form question answering. Our evaluation suggests that longform QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.

show abstract

Section: Related Workmentioning

confidence: 99%

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Sun

Hou

Wang

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…The most studied task is argument mining, i.e., the identification of argumentative units, argument components (e.g., conclusion and premise), and structures of text documents. However, despite a wealth of Natural Language Processing (NLP) research on extracting information from scientific literature-including entity extraction (Augenstein et al, 2017;Hou et al, 2019), relation identification (Luan et al, 2018), question answering (Demner-Fushman and Lin, 2007), and summarization (Erera et al, 2019)-relatively few attempts have been made to model argumentative structures in science.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the Second Workshop on Scholarly Document Processing

2021

View full text Add to dashboard Cite

Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges. BiologyWood Frogs (Rana sylvatica) are a charismatic species of frog common in much of North America. They breed in explosive choruses over a few nights in late winter to early spring. The incidence in Wood Frogs was associated with a die-off of frogs during the breeding chorus in the Sylamore District of the Ozark National Forest in Arkansas (Trauth et al., 2000). Computer ScienceLand use or cover change is a direct reflection of human activity, such as land use, urban expansion, and architectural planning, on the earth's surface caused by urbanization [1]. Remote sensing images are important data sources that can efficiently detect land changes. Meanwhile, remote sensing image-based change detection is the change identification of surficial objects or geographic phenomena through the remote observation of two or more different phases [2].

show abstract

“…The LongSumm task strives to learn how to cover the salient information conveyed in a given scientific document, taking into account the characteristics and the structure of the text. The motivation for LongSumm was first demonstrated by the IBM Science Summarizer system, (Erera et al, 2019) that retrieves and creates long summaries of scientific documents 1 . While Erera et al (2019) studied some use-cases and proposed a summarization approach with some human evaluation, the authors stressed the need of a large dataset that will unleash the research in this domain.…”

Section: Introductionmentioning

confidence: 99%

“…The motivation for LongSumm was first demonstrated by the IBM Science Summarizer system, (Erera et al, 2019) that retrieves and creates long summaries of scientific documents 1 . While Erera et al (2019) studied some use-cases and proposed a summarization approach with some human evaluation, the authors stressed the need of a large dataset that will unleash the research in this domain. LongSumm aims at filling this gap by providing large dataset of long summaries which are based on blogs written by Machine Learning and NLP experts.…”

Section: Introductionmentioning

confidence: 99%

Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm

Chandrasekaran¹,

Feigenblat²,

Hovy³

et al. 2020

Proceedings of the First Workshop on Scholarly Document Processing

Self Cite

View full text Add to dashboard Cite

We present the results of three Shared Tasks held at the Scholarly Document Processing Workshop at EMNLP2020: CL-SciSumm, LaySumm and LongSumm. We report on each of the tasks, which received 18 submissions in total, with some submissions addressing two or three of the tasks. In summary, the quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.

show abstract

A Summarization System for Scientific Documents

Cited by 43 publications

References 17 publications

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Proceedings of the Second Workshop on Scholarly Document Processing

Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm

Contact Info

Product

Resources

About