Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers

Patel, Krutarth; Caragea, Cornelia

doi:10.1145/3360901.3364447

Cited by 10 publications

(6 citation statements)

References 37 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model inserts a Bi-LSTM layer between the output and input layers in order to exploit dependencies in the text. Patel et al [76] Build a complex labeling model use the Bi-LSTM-CRF network, which incorporates long distance information about an input sequence as well about the output sequence. Sahrawat et al [77] formulate AKE as a sequence tagging using a BiLSTM-CRF, where phrases from the input text are represented at the using deep embedding.…”

Section: Model Vector Dimensionmentioning

confidence: 99%

A Systematic Literature Review of Keyphrases Extraction Approaches

Ajallouda

Fagroud

Zellou

et al. 2022

Int. J. Interact. Mob. Technol.

View full text Add to dashboard Cite

The keyphrases of a document are the textual units that characterize its content such as the topics it addresses, its ideas, their field, etc. Thousands of books, articles and web pages are published every day. Manually extracting keyphrases is a tedious task and takes a lot of time. Automatic keyphrases extraction is an area of text mining that aims to identify the most useful and important phrases that give meaning to the content of a document. Keyphrases can be used in many Natural Language Processing (NLP) applications, such as text summarization, text clustering and text classification. This article provides a Systematic Literature Review (SLR) to investigate, analyze, and discuss existing relevant contributions and efforts that use new concepts and tools to improve keyphrase extraction. We have studied the supervised and unsupervised approaches to extracting keyphrases published in the period 2015-2022. We have also identified the steps most commonly used by the different approaches. Additionally, we looked at the criteria that should be evaluated to improve the accuracy of keyphrases extraction. Each selected approach was evaluated for its ability to extract keyphrases. Our findings highlight the importance of keyphrase extraction, and provide researchers and practitioners with information about proposed solutions and their limitations, which contributes to extract keyphrases in a powerful and meaningful way effective.

show abstract

Section: Model Vector Dimensionmentioning

confidence: 99%

A Systematic Literature Review of Keyphrases Extraction Approaches

Ajallouda

Fagroud

Zellou

et al. 2022

Int. J. Interact. Mob. Technol.

View full text Add to dashboard Cite

show abstract

“…The intuition behind this weighting scheme is to give higher weight to words appearing in the beginning of a document since in scientific writing, authors tend to use keyphrases very early in the document (even from the title) (Florescu and Caragea, 2017). Based on these considerations, the first position of a phrase/word and its relative position are also used in many supervised approaches as powerful features (Patel and Caragea, 2019;Hulth, 2003;Wu et al, 2005) To calculate the weight w i for n i , we perform multiplication of both the theme score (ts i ) and the positional score (ps i ). The intuition is that we give preference to words that appear near the beginning of the document and are more frequent as compared with less frequent words appearing later in document even though both words may be equally close to the theme of the document or may have similar theme score.…”

Section: Biased Pagerankmentioning

confidence: 99%

“…ACM (Patel and Caragea, 2019) This dataset SemEval Inspec Krapivin NUS ACM F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 KPRank(SB) Figure 2: Keyphrase extraction confusion matrices of KPRank(SB) using @5 predictions on all the datasets. The darker the blue on the main diagonal, the more accurate the model is.…”

Section: Datamentioning

confidence: 99%

“…Usually, the performance of the supervised keyphrase extraction models is better than the unsupervised models (Kim et al, 2013). We compare the performance of KPRank(SB) with the CRF based sequence classification model for the keyphrase extraction (Patel and Caragea, 2019) that uses word embeddings as features along with document specific features. The CRF model outperforms KPRank(SB) on all five datasets, e.g., CRF model achieves an F1 of 45.73% as compared with 25.76% achieved by KPRank(SB) on SemEval.…”

Section: Organization Design: the Continuing Influence Of Information Technologymentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers

Patel¹,

Caragea²

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Self Cite

View full text Add to dashboard Cite

Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections. In this paper, we present KPRank, an unsupervised graph-based algorithm for keyphrase extraction that exploits both positional information and contextual word embeddings into a biased PageRank. Our experimental results on five benchmark datasets show that KPRank that uses contextual word embeddings with additional position signal outperforms previous approaches and strong baselines for this task.

show abstract

“…With the success of neural models, recent works try to address SKIC using neural architectures while exploiting the BIO schema. Although both tasks, keyphrase identification and keyphrase classification according to their types, are very important, many works focused only on keyphrase extraction/generation or identification/segmentation (Meng et al, 2017;Xiong et al, 2019;Patel and Caragea, 2019;Alzaidy et al, 2019;Chen et al, 2020). The classification task is less explored possibly due to a lack of a large number of gold-label keyphrase classification datasets.…”

Section: Related Workmentioning

confidence: 99%

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

Park¹,

Caragea²

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Scientific keyphrase identification and classification is the task of detecting and classifying keyphrases from scholarly text with their types from a set of predefined classes. This task has a wide range of benefits, but it is still challenging in performance due to the lack of large amounts of labeled data required for training deep neural models. In order to overcome this challenge, we explore pre-trained language models BERT and SciBERT with intermediate task transfer learning, using 42 data-rich related intermediate-target task combinations. We reveal that intermediate task transfer learning on SciBERT induces a better starting point for target task fine-tuning compared with BERT and achieves competitive performance in scientific keyphrase identification and classification compared to both previous works and strong baselines. Interestingly, we observe that BERT with intermediate task transfer learning fails to improve the performance of scientific keyphrase identification and classification potentially due to significant catastrophic forgetting. This result highlights that scientific knowledge achieved during the pre-training of language models on large scientific collections plays an important role in the target tasks. We also observe that sequence tagging related intermediate tasks, especially syntactic structure learning tasks such as POS Tagging, tend to work best for scientific keyphrase identification and classification.

show abstract

Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers

Cited by 10 publications

References 37 publications

A Systematic Literature Review of Keyphrases Extraction Approaches

A Systematic Literature Review of Keyphrases Extraction Approaches

Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

Contact Info

Product

Resources

About