Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 2015
DOI: 10.18653/v1/d15-1229
|View full text |Cite
|
Sign up to set email alerts
|

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

Abstract: Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set. Due to the great challenge of constructing the large scale summaries for full text, in this paper, we introduce a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which is released to the public 1 . This corpus consists of over 2 million real Chinese short texts with short summaries given by th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
219
0
1

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 238 publications
(220 citation statements)
references
References 17 publications
0
219
0
1
Order By: Relevance
“…Additionally, we also report other standard language generation metrics (as motivated recently by ): METEOR (Denkowski and Lavie, 2014), BLEU-4 (Papineni et al, 2002), and CIDEr-D (Vedantam et al, 2015), based on the MS-COCO evaluation script (Chen et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, we also report other standard language generation metrics (as motivated recently by ): METEOR (Denkowski and Lavie, 2014), BLEU-4 (Papineni et al, 2002), and CIDEr-D (Vedantam et al, 2015), based on the MS-COCO evaluation script (Chen et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…Automatic abstractive summarization can be considered one of the most challenging variants of automatic summarization (Gambhir and Gupta, 2017). But with recent advancements in the field of deep learning, new ground was broken using various kinds of neural network models Hu et al, 2015;Chopra et al, 2016;. The performance of these kinds of summarization models strongly depends on large amounts of suitable training data.…”
Section: Introductionmentioning
confidence: 99%
“…Next to the English resources listed in Table 1, the LCSTS dataset collected by Hu et al (2015) is perhaps closest to our own work-both in terms of text genre and collection method. Their dataset comprises 2.5 million content-summary pairs collected from the Chinese social media platform Weibo, a service similar to Twitter in that a post is limited to 140 characters.…”
Section: Related Workmentioning
confidence: 99%
“…Automatic abstractive summarization can be considered one of the most challenging variants of automatic summarization (Gambhir and Gupta, 2017). But with recent advancements in the field of deep learning, new ground was broken using various kinds of neural network models (Rush et al, 2015;Hu et al, 2015;Chopra et al, 2016;See et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…K Lopvrev et al built an abstract generation model based on the encoder-decoder framework in 2015 by using RNN(Recurrent Neural Network) with unit of LSTM(Long Short-Term Memory) [5] and used an attention mechanism to generate news headlines [6].Secondly, the two papers [7,8] published by Rush et al from the Facebook Artificial Intelligence Research Institute from 2015 to 2016 to solve the text abstract generation task, based on the Encoder-Decoder architecture, proposed different encoder approaches based CNN(Convolutional Neural Network) and attention mechanisms, and decoder architecture based on the RNNLM(Recurrent Neural Network Language Model). Hu et al [9] applied RNN-based Encoder-Decoder architecture to Chinese text digest tasks and constructed a Chinese text digest dataset LCSTS to facilitate the study of Chinese comprehension abstracts. This paper mainly studies sentence-level Chinese short text comprehension abstract generation tasks and builds a summary generation model based on LCSTS data sets.…”
Section: Introductionmentioning
confidence: 99%