SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Gliwa, Bogdan; Mochol, Iwona; Biesek, Maciej; Wawer, Aleksander

doi:10.18653/v1/d19-5409

Cited by 257 publications

(366 citation statements)

References 15 publications

Supporting

Mentioning

278

Contrasting

Unclassified

Order By: Relevance

“…Global View and Discrete View In addition to the aforementioned two structured views, conversations can also be naturally viewed from a relatively coarse perspective, i.e., a global view that concatenates all utterances into one giant block (Gliwa et al, 2019), and a discrete view that separates each utterance into a distinct block Gliwa et al, 2019).…”

Section: Conversation View Extractionmentioning

confidence: 99%

“…We evaluate our model on a large-scale dialogue summary dataset SAMSum (Gliwa et al, 2019) that has 14732 dialogues with human-written summaries. The data statistics are shown in Table 3.…”

Section: Dataset and Baselinesmentioning

confidence: 99%

“…There has been some recent research on conversation summarization such as directly deploying existing document summarization models (Gliwa et al, 2019) and exploring multi-sentence compression (Shang et al, 2018), however, most of them haven't utilized specific conversational structures, which refer to the way utterances are organized in order to make the conversation meaningful, enjoyable and understandable (Sacks et al, 1978), in dialogues -a key factor that differentiates dialogues from structured documents. As a way of using language socially of "doing things with words" together with other persons, the conversation has its own dynamic structures that organize utterances in certain orders to make the conversation meaningful, enjoyable, and understandable (Sacks et al, 1978).…”

Section: Introductionmentioning

confidence: 99%

“…As a way of using language socially of "doing things with words" together with other persons, the conversation has its own dynamic structures that organize utterances in certain orders to make the conversation meaningful, enjoyable, and understandable (Sacks et al, 1978). Although there are a few exceptions such as utilizing topic segmentation (Liu et al, 2019b;, dialogue acts (Goo and Chen, 2018) or key point sequence (Liu et al, 2019a) (Gliwa et al, 2019) with its topic view and stage view (extracted by our methods), and the human annotated summary. extensive expert annotations of discourse acts (Goo and Chen, 2018;Liu et al, 2019a), or only encode conversations based on their topics (Liu et al, 2019b), which fails to capture rich conversation structures in dialogues.…”

Section: Introductionmentioning

confidence: 99%

“…(2) We de-sign a multi-view sequence-to-sequence model that consists of a conversation encoder to encode different views and a multi-view decoder with multiview attention to generate dialogue summaries. (3) We perform experiments on a large-scale conversation summarization dataset, SAMSum (Gliwa et al, 2019), and demonstrate the effectiveness of our proposed methods. (4) We conduct thorough error analyses and discuss specific challenges that current approaches faced with this task.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

Chen¹,

Yang²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Text summarization is one of the most challenging and interesting problems in NLP. Although much attention has been paid to summarizing structured text like news reports or encyclopedia articles, summarizing conversations-an essential part of humanhuman/machine interaction where most important pieces of information are scattered across various utterances of different speakersremains relatively under-investigated. This work proposes a multi-view sequence-tosequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations and then utilizing a multi-view decoder to incorporate different views to generate dialogue summaries. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment. We also discussed specific challenges that current approaches faced with this task. We have publicly released our code at https://github.com/GT-SALT/ Multi-View-Seq2Seq.

show abstract

Section: Conversation View Extractionmentioning

confidence: 99%

“…We evaluate our model on a large-scale dialogue summary dataset SAMSum (Gliwa et al, 2019) that has 14732 dialogues with human-written summaries. The data statistics are shown in Table 3.…”

Section: Dataset and Baselinesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

Chen¹,

Yang²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

CLTS: A New Chinese Long Text Summarization Dataset

Liu

Zhang

Chen

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The abstractive methods lack of creative ability is particularly a problem in automatic text summarization. The summaries generated by models are mostly extracted from the source articles. One of the main causes for this problem is the lack of dataset with abstractiveness, especially for Chinese. In order to solve this problem, we paraphrase the reference summaries in CLTS, the Chinese Long Text Summarization dataset, correct errors of factual inconsistencies, and propose the first Chinese Long Text Summarization dataset with a high level of abstractiveness, CLTS+, which contains more than 180K article-summary pairs and is available online 1 . Additionally, we introduce an intrinsic metric based on co-occurrence words to evaluate the dataset we constructed. We analyze the extraction strategies used in CLTS+ summaries against other datasets to quantify the abstractiveness and difficulty of our new data and train several baselines on CLTS+ to verify the utility of it for improving the creative ability of models.

show abstract

Incorporating Commonsense Knowledge into Abstractive Dialogue Summarization via Heterogeneous Graph Networks

Feng

Qin

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ive dialogue summarization is the task of capturing the highlights of a dialogue and rewriting them into a concise version. In this paper, we present a novel multi-speaker dialogue summarizer to demonstrate how large-scale commonsense knowledge can facilitate dialogue understanding and summary generation. In detail, we consider utterance and commonsense knowledge as two different types of data and design a Dialogue Heterogeneous Graph Network (D-HGN) for modeling both information. Meanwhile, we also add speakers as heterogeneous nodes to facilitate information flow. Experimental results on the SAMSum dataset show that our model can outperform various methods. We also conduct zero-shot setting experiments on the Argumentative Dialogue Summary Corpus, the results show that our model can better generalized to the new domain. 1 We pre-define the useless relation list, including Antonym, EtymologicallyDerivedFrom, NotHasProperty, DistinctFrom, NotCapableOf, EtymologicallyRelatedTo and NotDesires.

show abstract

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Cited by 257 publications

References 15 publications

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

CLTS: A New Chinese Long Text Summarization Dataset

Incorporating Commonsense Knowledge into Abstractive Dialogue Summarization via Heterogeneous Graph Networks

Contact Info

Product

Resources

About