A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Campos, Ricardo; Mangaravite, Vítor; Pasquali, Arian; Jorge, Alípio Mário; Nunes, Célia; Jatowt, Adam

doi:10.1007/978-3-319-76941-7_63

Cited by 90 publications

(66 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The great importance of using both statistics and contexts info is confirmed by recent methods such as YAKE (Campos et al, 2018b) and the method proposed by Won, Martins, and Raimundo (2019). YAKE, besides the term's position/frequency, also uses new statistical metrics that capture context information and the spread of the terms in the document.…”

Section: Statistics-based Methodsmentioning

confidence: 99%

A review of keyphrase extraction

Papagiannopoulou

Tsoumakas

2019

WIREs Data Min & Knowl

128

View full text Add to dashboard Cite

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a succinct conceptual summary of a document, which is very useful in digital information management systems for semantic indexing, faceted search, document clustering and classification. This article introduces keyphrase extraction, provides a well‐structured review of the existing work, offers interesting insights on the different evaluation approaches, highlights open issues and presents a comparative experimental study of popular unsupervised techniques on five datasets. This article is categorized under: Ensemble Methods > Web Mining Ensemble Methods > Text Mining

show abstract

Section: Statistics-based Methodsmentioning

confidence: 99%

A review of keyphrase extraction

Papagiannopoulou

Tsoumakas

2019

WIREs Data Min & Knowl

128

View full text Add to dashboard Cite

show abstract

“…We adopted the same evaluation procedure as used for the series of results recently introduced by YAKE authors [6] 5 . Five fold cross validation was used to determine the overall performance, for which we measured Precision, Recall and F1 score, with the latter being reported in Table 2.…”

Section: Experimental Settingmentioning

confidence: 99%

“…Five fold cross validation was used to determine the overall performance, for which we measured Precision, Recall and F1 score, with the latter being reported in Table 2. 6 Keywords were stemmed prior to evaluation. 7 As the number of keywords in the gold standard document is not equal to the number of extracted keywords (in our experiments k=10), in the recall we divide the correctly extracted keywords by the number of keywords parameter k, if in the gold standard number of keywords is higher than k. Selecting default configuration.…”

Section: Experimental Settingmentioning

confidence: 99%

“…The former, such as YAKE [7,6], KP-MINER [10] and RAKE [25], use statistical characteristics of the texts to capture keywords, while the latter, such as Topic Rank [3], TextRank [22], Topical PageRank [29] and Single Rank [30], build graphs to rank words based on their position in the graph. Among statistical approaches, the state-of-the-art keyword extraction algorithm is YAKE [7,6], which is also one of the best performing keyword extraction algorithms overall; it defines a set of five features capturing keyword characteristics which are heuristically combined to assign a single score to every keyword. On the other hand, among graph-based approaches, Topic Rank [3] can be considered stateof-the-art; candidate keywords are clustered into topics and used as vertices in the final graph, used for keyword extraction.…”

mentioning

confidence: 99%

See 1 more Smart Citation

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Škrlj

Repar

Pollak

2019

Statistical Language and Speech Processing

View full text Add to dashboard Cite

Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems. We explore how load centrality, a graph-theoretic measure applied to graphs derived from a given text can be used to efficiently identify and rank keywords. Introducing meta vertices (aggregates of existing vertices) and systematic redundancy filters, the proposed method performs on par with stateof-the-art for the keyword extraction task on 14 diverse datasets. The proposed method is unsupervised, interpretable and can also be used for document visualization.Keywords: keyword extraction · graph applications · vertex ranking· load centrality · information retrieval 1 Introduction and related work Keywords are terms (i.e. expressions) that best describe the subject of a document [2]. A good keyword effectively summarizes the content of the document and allows it to be efficiently retrieved when needed. Traditionally, keyword assignment was a manual task, but with the emergence of large amounts of textual data, automatic keyword extraction methods have become indispensable. Despite a considerable effort from the research community, state-of-the-art keyword extraction algorithms leave much to be desired and their performance is still lower than on many other core NLP tasks [13]. The first keyword extraction methods mostly followed a supervised approach [14,24,31]: they first extract keyword features and then train a classifier on a gold standard dataset. For example, KEA [31], a state of the art supervised keyword extraction algorithm is based on the Naive Bayes machine learning algorithm. While these methods offer quite good performance, they rely on an annotated gold standard dataset and require a (relatively) long training process. In contrast, unsupervised approaches need no training and can be applied directly without relying on a gold standard document collection. They can be further divided into statistical and graph-based arXiv:1907.06458v1 [cs.CL] 15 Jul 2019 2Škrlj, Repar and Pollak.

show abstract

“…The next approach uses Yet Another Keyword Extractor (YAKE) [13], which is a statistical method for multi-lingual keyphrase extraction. Being an unsupervised method, YAKE avoids the problem of the long training process of other supervised methods and does not depend on any dictionaries for topic extraction.…”

Section: Interest Identification Using Yakementioning

confidence: 99%

Identifying Short-term Interests from Mobile App Adoption Pattern

Gaind

Varshney

Goel

et al. 2019

CyS

View full text Add to dashboard Cite

With the increase in an average user's dependence on their mobile devices, the reliance on collecting his browsing history from mobile browsers has also increased. This browsing history is highly utilized in the advertising industry for providing targeted ads in the purview of inferring his short-term interests and pushing relevant ads. However, the major limitation of such an extraction from mobile browsers is that they reset when the browser is closed or when the device is shut down/restarted; thus rendering existing methods to identify the user's short-term interests on mobile devices users, ineffective. In this paper, we propose an alternative method to identify such short-term interests by analysing their mobile app adoption (installation/uninstallation) patterns over a period of time. Such a method can be highly effective in pinpointing the user's ephemeral inclinations like buying/renting an apartment, buying/selling a car or a sudden increased interest in shopping (possibly due to a recent salary bonus, he received). Subsequently, these derived interests are also used for targeted experiments. Our experiments result in up to 93.68% higher click-through rate in comparison to the ads shown without any user-interest knowledge. Also, up to 51% higher revenue in the long term is expected as a result of the application of our proposed algorithm. 1

show abstract

A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Cited by 90 publications

References 5 publications

A review of keyphrase extraction

A review of keyphrase extraction

RaKUn: Rank-based Keyword Extraction via Unsupervised Learning and Meta Vertex Aggregation

Identifying Short-term Interests from Mobile App Adoption Pattern

Contact Info

Product

Resources

About