Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.137
|View full text |Cite
|
Sign up to set email alerts
|

Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents

Abstract: Faceted summarization provides briefings of a document from different perspectives. Readers can quickly comprehend the main points of a long document with the help of a structured outline. However, little research has been conducted on this subject, partially due to the lack of large-scale faceted summarization datasets. In this study, we present FacetSum, a faceted summarization benchmark built on Emerald journal articles, covering a diverse range of domains. Different from traditional documentsummary pairs, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 24 publications
0
12
1
Order By: Relevance
“…Each article consists of a background paragraph about the issue, along with a set of questions about the issue and short answers to those questions. FacetSum (Meng et al, 2021) is a found dataset consisting of a corpus of scientific papers paired with author-written summaries focusing on different aspects of the paper. WikiAsp (Hayashi et al, 2021) and AQuaMuSe (Kulkarni et al, 2020) are two heuristically created, multidocument QFS datasets derived from Wikipedia.…”
Section: Question-focused Summarizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Each article consists of a background paragraph about the issue, along with a set of questions about the issue and short answers to those questions. FacetSum (Meng et al, 2021) is a found dataset consisting of a corpus of scientific papers paired with author-written summaries focusing on different aspects of the paper. WikiAsp (Hayashi et al, 2021) and AQuaMuSe (Kulkarni et al, 2020) are two heuristically created, multidocument QFS datasets derived from Wikipedia.…”
Section: Question-focused Summarizationmentioning
confidence: 99%
“…For example, many researchers and organizations are unwilling to host or distribute the CNN/DailyMail dataset, 1 despite it being one of the most popular summarization datasets to experiment on. Similarly, several recent summarization datasets built on data such as scientific journal papers (Meng et al, 2021) or SparkNotes book summaries (Ladhak et al, 2020; have never been made available to researchers, with the dataset creators instead asking potential data users to rescrape them individually, which can be a serious obstacle to reproducibility.…”
Section: Introductionmentioning
confidence: 99%
“…Yasunaga et al [57] efficiently create a dataset for the computational linguistics domain by manually exploiting the structure of papers. Meng et al [38] present a dataset which contains four summaries from different aspects for each paper, which makes it possible to provide summaries depending on requests by users. Lu et al [35] is a large-scale dataset for multi-document summarization for scientific papers, for which models need to summarize multiple documents.…”
Section: Related Workmentioning
confidence: 99%
“…While hierarchical encoding has been investigated (Zhang et al, 2019;Balachandran et al, 2021), its need for training large amounts of additional parameters leads to increased memory footprint and thus limits the allowed input length. As for the output, the structure of single document summaries remains largely "flat", such as a list of aspects (Meng et al, 2021). We argue that it is imperative to develop systems that can output summaries with rich structures to support knowledge acquisition, which is especially critical for long documents that cover numerous subjects with varying details (Huang et al, 2021;Kryściński et al, 2021).…”
Section: Introductionmentioning
confidence: 99%