Proceedings of the 10th International Conference on Knowledge Capture 2019
DOI: 10.1145/3360901.3364447
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers

Abstract: Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections. However, keyphrases are not always provided with the papers, but they need to be extracted from their content. In this paper, we explore keyphrase extraction formulated as sequence labeling and utilize the power of Conditional Random Fields in capturing label dependencies through a transition parameter matrix consisting of the transition probabilities from one l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 37 publications
(50 reference statements)
0
6
0
Order By: Relevance
“…This model inserts a Bi-LSTM layer between the output and input layers in order to exploit dependencies in the text. Patel et al [76] Build a complex labeling model use the Bi-LSTM-CRF network, which incorporates long distance information about an input sequence as well about the output sequence. Sahrawat et al [77] formulate AKE as a sequence tagging using a BiLSTM-CRF, where phrases from the input text are represented at the using deep embedding.…”
Section: Model Vector Dimensionmentioning
confidence: 99%
“…This model inserts a Bi-LSTM layer between the output and input layers in order to exploit dependencies in the text. Patel et al [76] Build a complex labeling model use the Bi-LSTM-CRF network, which incorporates long distance information about an input sequence as well about the output sequence. Sahrawat et al [77] formulate AKE as a sequence tagging using a BiLSTM-CRF, where phrases from the input text are represented at the using deep embedding.…”
Section: Model Vector Dimensionmentioning
confidence: 99%
“…The intuition behind this weighting scheme is to give higher weight to words appearing in the beginning of a document since in scientific writing, authors tend to use keyphrases very early in the document (even from the title) (Florescu and Caragea, 2017). Based on these considerations, the first position of a phrase/word and its relative position are also used in many supervised approaches as powerful features (Patel and Caragea, 2019;Hulth, 2003;Wu et al, 2005) To calculate the weight w i for n i , we perform multiplication of both the theme score (ts i ) and the positional score (ps i ). The intuition is that we give preference to words that appear near the beginning of the document and are more frequent as compared with less frequent words appearing later in document even though both words may be equally close to the theme of the document or may have similar theme score.…”
Section: Biased Pagerankmentioning
confidence: 99%
“…ACM (Patel and Caragea, 2019) This dataset SemEval Inspec Krapivin NUS ACM F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 F1@5 F1@10 KPRank(SB) Figure 2: Keyphrase extraction confusion matrices of KPRank(SB) using @5 predictions on all the datasets. The darker the blue on the main diagonal, the more accurate the model is.…”
Section: Datamentioning
confidence: 99%
See 1 more Smart Citation
“…With the success of neural models, recent works try to address SKIC using neural architectures while exploiting the BIO schema. Although both tasks, keyphrase identification and keyphrase classification according to their types, are very important, many works focused only on keyphrase extraction/generation or identification/segmentation (Meng et al, 2017;Xiong et al, 2019;Patel and Caragea, 2019;Alzaidy et al, 2019;Chen et al, 2020). The classification task is less explored possibly due to a lack of a large number of gold-label keyphrase classification datasets.…”
Section: Related Workmentioning
confidence: 99%