Learning Rich Representation of Keyphrases from Text

Kulkarni, Mayank; Mahata, Debanjan; Arora, Ravneet; Bhowmik, Rajarshi

doi:10.18653/v1/2022.findings-naacl.67

Cited by 28 publications

(21 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Song et al [217] evaluated ChatGPT on multiple datasets from news and scientific literature domains having both short and long documents. Experiment results showed that ChatGPT outperforms KeyBART [227], the SOTA model, on all the datasets.…”

Section: Keyphrase Generationmentioning

confidence: 95%

“…The primary advantage of KPG over keyphrase extraction is the ability to generate both extractive and abstractive keyphrases. Keyphrase generation is approached as a sequence-to-sequence generation task [12], [226], [227] in the existing works. The current state-of-the-art model for keyphrase generation is, Key-BART [227], which is based on BART and trained using the text-to-text generation paradigm.…”

Section: Keyphrase Generationmentioning

confidence: 99%

“…Keyphrase generation is approached as a sequence-to-sequence generation task [12], [226], [227] in the existing works. The current state-of-the-art model for keyphrase generation is, Key-BART [227], which is based on BART and trained using the text-to-text generation paradigm. Table 5 presents a summary of research works exploring GLLMs for keyphrase generation.…”

Section: Keyphrase Generationmentioning

confidence: 99%

See 2 more Smart Citations

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan

2023

SSRN Journal

View full text Add to dashboard Cite

Section: Keyphrase Generationmentioning

confidence: 95%

Section: Keyphrase Generationmentioning

confidence: 99%

Section: Keyphrase Generationmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan

2023

SSRN Journal

View full text Add to dashboard Cite

“…The task of keyphrase generation is introduced to predict both present and absent keyphrases. (Swaminathan et al, 2020), hierarchical decoding (Chen et al, 2020b), graphs (Ye et al, 2021a), dropout (Ray Chowdhury et al, 2022), and pretraining (Kulkarni et al, 2022;Wu et al, 2022a) to improve keyphrase generation. Furthermore, there have been several attempts to unify KE and KG tasks into a single learning framework.…”

Section: Keyphrase Generationmentioning

confidence: 99%

Strategy of Naver Webtoon : News article analysis by period

Choi¹,

Kim²

2022

ISM

View full text Add to dashboard Cite

Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SIMCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phraselevel representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin. IntroductionKeyphrase prediction (KP) is a task of identifying a set of relevant words or phrases that capture the main ideas or topics discussed in a given document. Prior studies have defined keyphrases that appear in the document as present keyphrases and the opposites as absent keyphrases. High-quality keyphrases are beneficial for various applications such as information retrieval (Kim et al., 2013), text summarization (Pasunuru and Bansal, 2018), and translation (Tang et al., 2016). KP methods are generally divided into keyphrase extraction (KE) (Witten * Work done during an internship at Naver Webtoon.

show abstract

“…Previous KPE works include supervised and unsupervised approaches. Supervised approaches model KPE as sequence tagging (Sahrawat et al, 2019;Alzaidy et al, 2019;Martinc et al, 2020;Santosh et al, 2020;Nikzad-Khasmakhi et al, 2021) or sequence generation tasks Kulkarni et al, 2021) and require large-scale annotated data to perform well. Since KPE annotations are expensive and largescale KPE annotated data is scarce, unsupervised KPE approaches, such as TextRank (Mihalcea and Tarau, 2004), YAKE (Campos et al, 2018), Em-bedRank (Bennani-Smires et al, 2018), are the mainstay in industry deployment.…”

Section: Introductionmentioning

confidence: 99%

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

Zhang¹,

Chen²,

Wang³

et al. 2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F 1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F 1@15 improvement over the SOTA SIFRank. Our code is available at https: //github.com/LinhanZ/mderank.

show abstract

Learning Rich Representation of Keyphrases from Text

Cited by 28 publications

References 35 publications

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Strategy of Naver Webtoon : News article analysis by period

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

Contact Info

Product

Resources

About