PositionRank: An Unsupervised Approach to Keyphrase Extraction from
            Scholarly Documents

Florescu, Corina; Caragea, Cornelia

doi:10.18653/v1/p17-1102

Cited by 258 publications

(193 citation statements)

References 23 publications

Supporting

Mentioning

190

Contrasting

Unclassified

Order By: Relevance

“…Finally, the top T ranked candidate phrases are selected as keyphrases for the document. In this vein, the more recent methods SGRank (Danesh, Sumner, & Martin, 2015) and PositionRank (PR) (Florescu & Caragea, 2017b) utilize statistical, positional, and, word co-occurrence information, thus improving the overall performance. In particular, SGRank (Danesh et al, 2015), first, extracts all possible n-grams from the input text, eliminating those that contain punctuation marks or whose words are anything different than noun, adjective or verb.…”

Section: Graph-based Ranking Methodsmentioning

confidence: 99%

A review of keyphrase extraction

Papagiannopoulou

Tsoumakas

2019

WIREs Data Min & Knowl

128

View full text Add to dashboard Cite

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a succinct conceptual summary of a document, which is very useful in digital information management systems for semantic indexing, faceted search, document clustering and classification. This article introduces keyphrase extraction, provides a well‐structured review of the existing work, offers interesting insights on the different evaluation approaches, highlights open issues and presents a comparative experimental study of popular unsupervised techniques on five datasets. This article is categorized under: Ensemble Methods > Web Mining Ensemble Methods > Text Mining

show abstract

Section: Graph-based Ranking Methodsmentioning

confidence: 99%

A review of keyphrase extraction

Papagiannopoulou

Tsoumakas

2019

WIREs Data Min & Knowl

128

View full text Add to dashboard Cite

show abstract

“…In supervised approaches, a model is trained to learn to classify keyphrases from training data that is annotated with keyphrases [7,13,18,28,56,57,62]. Many unsupervised keyphrase extraction techniques had also been previously proposed [10,15,24,27,33,47,49,53]. They usually extract candidate keyphrases and rank them based on term frequencies, word co-occurrences, and other similar features.…”

Section: Related Workmentioning

confidence: 99%

Keyphrase Extraction from Disaster-related Tweets

Ray

Caragea

2019

The World Wide Web Conference

Self Cite

View full text Add to dashboard Cite

While keyphrase extraction has received considerable attention in recent years, relatively few studies exist on extracting keyphrases from social media platforms such as Twitter, and even fewer for extracting disaster-related keyphrases from such sources. During a disaster, keyphrases can be extremely useful for filtering relevant tweets that can enhance situational awareness. Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data. We improve the model's performance on both general Twitter data and disaster-related Twitter data by incorporating contextual word embeddings, POS-tags, phonetics, and phonological features. Moreover, we discuss the shortcomings of the often used F1-measure for evaluating the quality of predicted keyphrases with respect to the ground truth annotations. Instead of the F1-measure, we propose the use of embedding-based metrics to better capture the correctness of the predicted keyphrases. In addition, we also present a novel extension of an embedding-based metric. The extension allows one to better control the penalty for the difference in the number of ground-truth and predicted keyphrases.

show abstract

“…The methods that are based on statistical information and structural information, for example tf-idf (term frequency-inverse document frequency), phrase position, and topic proportion, are language independent [8,27,[35][36][37][38]. However, weighting more to single terms than multiword terms and overlooking the semantics, are their main drawbacks.…”

Section: Related Workmentioning

confidence: 99%

Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches

et al. 2018

View full text Add to dashboard Cite

Automatic key concept extraction from text is the main challenging task in information extraction, information retrieval and digital libraries, ontology learning, and text analysis. The statistical frequency and topical graph-based ranking are the two kinds of potentially powerful and leading unsupervised approaches in this area, devised to address the problem. To utilize the potential of these approaches and improve key concept identification, a comprehensive performance analysis of these approaches on datasets from different domains is needed. The objective of the study presented in this paper is to perform a comprehensive empirical analysis of selected frequency and topical graph-based algorithms for key concept extraction on three different datasets, to identify the major sources of error in these approaches. For experimental analysis, we have selected TF-IDF, KP-Miner and TopicRank. Three major sources of error, i.e., frequency errors, syntactical errors and semantical errors, and the factors that contribute to these errors are identified. Analysis of the results reveals that performance of the selected approaches is significantly degraded by these errors. These findings can help us develop an intelligent solution for key concept extraction in the future.

show abstract

PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents

Cited by 258 publications

References 23 publications

A review of keyphrase extraction

A review of keyphrase extraction

Keyphrase Extraction from Disaster-related Tweets

Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches

Contact Info

Product

Resources

About