KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

Gallina, Ygor; Boudin, Florian; Daille, Béatrice

doi:10.18653/v1/w19-8617

Cited by 27 publications

(32 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That being said, CopyRNN, which is the best overall model, fails to consistently outperform the baselines on all datasets. One reason for that is the limited generalization ability of neural-based models [10,15,34], which means that their performance degrades on documents that differ from the ones encountered during training. This is besides confirmed by the extremely low performance of these models on DUC-2001 and KPCrowd.…”

Section: Resultsmentioning

confidence: 99%

“…Similar to paper abstracts, online news are available in large quantities and can be easily mined from the internet. We selected the following three datasets: DUC-2001 [43], 500N-KPCrowd [33] and KP-Times [15]. The first two datasets provide reader-assigned keyphrases, while KPTimes supplies indexer-assigned keyphrases extracted from metadata and initially intended for search engines.…”

Section: Benchmark Datasetsmentioning

confidence: 99%

“…To name just a few examples, experiments are most often conducted on different benchmark datasets, all of which differ in domain, size, language or quality of the gold standard (that is, reference keyphrases supplied by authors, readers or professional indexers). This not only makes the reported results hard to contrast, but also has a profound impact on trained model performance [15]. In addition, and since there is no consensus as to which evaluation metric is most reliable for keyphrase extraction [21,24,49], diverse measures are commonly seen in the literature, thus preventing any further direct comparisons.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Large-Scale Evaluation of Keyphrase Extraction Models

Gallina

Boudin

Daille

2020

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

Self Cite

View full text Add to dashboard Cite

Keyphrase extraction models are usually evaluated under different, not directly comparable, experimental setups. As a result, it remains unclear how well proposed models actually perform, and how they compare to each other. In this work, we address this issue by presenting a systematic large-scale analysis of state-ofthe-art keyphrase extraction models involving multiple benchmark datasets from various sources and domains. Our main results reveal that state-of-the-art models are in fact still challenged by simple baselines on some datasets. We also present new insights about the impact of using author-or reader-assigned keyphrases as a proxy for gold standard, and give recommendations for strong baselines and reliable benchmark datasets.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Benchmark Datasetsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Large-Scale Evaluation of Keyphrase Extraction Models

Gallina

Boudin

Daille

2020

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…For future work, we plan to explore directions that would enable us to simultaneously optimize for quality and diversity metrics. *Note that our test set for KPTimes is a combination of 10k records from KPTimes and 10k records from JPTimes (Gallina et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

Diverse Keyphrase Generation with Neural Unlikelihood Training

Bahuleyan¹,

Asri²

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In this paper, we study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity. Recent advances in neural natural language generation have made possible remarkable progress on the task of keyphrase generation, demonstrated through improvements on quality metrics such as F 1 -score. However, the importance of diversity in keyphrase generation has been largely ignored. We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE). Our findings show that repetition of keyphrases is a major issue with MLE training. To alleviate this issue, we adopt neural unlikelihood (UL) objective for training the S2S model. Our version of UL training operates at (1) the target token level to discourage the generation of repeating tokens; (2) the copy token level to avoid copying repetitive tokens from the source text. Further, to encourage better model planning during the decoding process, we incorporate K-step ahead token prediction objective that computes both MLE and UL losses on future tokens as well. Through extensive experiments on datasets from three different domains we demonstrate that the proposed approach attains considerably large diversity gains, while maintaining competitive output quality.

show abstract

“…We adopt the standard metric and compute the f-measure at top 5, as it corresponds to the average number of keyphrases in KP20k and NTCIR-2, that is, 5.3 and 4.8, respectively. We also examine cross-domain generalization using the KPTimes news dataset (Gallina et al, 2019), and include a state-of-the-art unsupervised keyphrase extraction model (Boudin, 2018, henceforth mp-rank) for comparison purposes. This latter baseline also provides an additional relevance signal based on graph-based ranking whose usefulness in retrieval will be tested in subsequent experiments.…”

Section: Keyphrase Generationmentioning

confidence: 99%

Keyphrase Generation for Scientific Document Retrieval

Boudin¹,

Gallina²,

Aizawa³

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Sequence-to-sequence models have lead to significant progress in keyphrase generation, but it remains unknown whether they are reliable enough to be beneficial for document retrieval. This study provides empirical evidence that such models can significantly improve retrieval performance, and introduces a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models. Using this framework, we point out and discuss the difficulties encountered with supplementing documents with -not present in textkeyphrases, and generalizing models across domains. Our code is available at https:// github.com/boudinfl/ir-using-kg.

show abstract

KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

Cited by 27 publications

References 12 publications

Large-Scale Evaluation of Keyphrase Extraction Models

Large-Scale Evaluation of Keyphrase Extraction Models

Diverse Keyphrase Generation with Neural Unlikelihood Training

Keyphrase Generation for Scientific Document Retrieval

Contact Info

Product

Resources

About