An automatic keyphrase extraction system for scientific documents

You, Wei; Fontaine, Dominique; Barthès, Jean-Paul

doi:10.1007/s10115-012-0480-2

Cited by 45 publications

(23 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use four benchmark datasets shown in Table 1 for empirical observations and comparisons. These datasets have been used extensively to evaluate keyword extraction algorithms [4,19,30,33,35,43]. Table 1 presents general properties of the four datasets, including number of documents in corpus, average document length, average number of gold-standard keywords along with standard deviation, and average percentage of candidate keywords.…”

Section: Methodsmentioning

confidence: 99%

sCAKE: Semantic Connectivity Aware Keyword Extraction

Duari

Bhatnagar

2019

Information Sciences

View full text Add to dashboard Cite

Keyword Extraction is an important task in several text analysis endeavours.In this paper, we present a critical discussion of the issues and challenges in graph-based keyword extraction methods, along with comprehensive empirical analysis. We propose a parameterless method for constructing graph of text that captures the contextual relation between words. A novel word scoring method is also proposed based on the connection between concepts. We demonstrate that both proposals are individually superior to those followed by the sate-of-theart graph-based keyword extraction algorithms. Combination of the proposed graph construction and scoring methods leads to a novel, parameterless keyword extraction method (sCAKE) based on semantic connectivity of words in the document.Motivated by limited availability of NLP tools for several languages, we also design and present a language-agnostic keyword extraction (LAKE) method.We eliminate the need of NLP tools by using a statistical filter to identify candidate keywords before constructing the graph. We show that the resulting method is a competent solution for extracting keywords from documents of languages lacking sophisticated NLP support.

show abstract

Section: Methodsmentioning

confidence: 99%

sCAKE: Semantic Connectivity Aware Keyword Extraction

Duari

Bhatnagar

2019

Information Sciences

View full text Add to dashboard Cite

show abstract

“…According to the results reported in [5], the use of the feature indicating phrase position at the beginning of a document works for academic papers and does not lead to better performance in case of book chapters and scientific webpages that do not have an abstract. In our research we show that precisely at the beginning of a scientific publication, in an abstract section, which is not included in a literary text, the major part of keyphrases is gathered.…”

Section: Introduction and Related Workmentioning

confidence: 85%

“…This fact should also be considered, as it allows to generate less candidate phrases during processing. In [5] it was mentioned that too many candidates negatively influence ranking and one of the most important tasks consists in elaboration of algorithms for the construction of small candidate sets.…”

Section: Introduction and Related Workmentioning

confidence: 99%

“…The first one deals with word (words sequences) ranking, selection of top-ranked units and phrase construction [1][2][3]. The second one is the most widely used and spans keyphrases construction from candidates, candidate ranking and selection of the best keyphrases or classification of candidates [4][5][6][7][8][9][10][11]. Candidate phrase building often uses n-grams, word sequences that satisfy a number of constraints, e.g.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Keyphrase Extraction Abstracts Instead of Full Papers

Popova

Danilova

2014

2014 25th International Workshop on Database and Expert Systems Applications

View full text Add to dashboard Cite

In the present paper 1 we consider keyphrase extraction problem from scientific articles. Finding an appropriate solution is important for the organization of fast navigation in databases, indexing, clustering and classification of academic papers. The base collection includes keyphrases selected by the experts for each text (SemEval2010). It is shown that the use of abstracts instead of full texts allows to improve the results obtained by processing full texts or abstracts with introduction and conclusion section. Our approach uses the extraction of keyphrases with linguistic patterns (part of speechbased); patterns are built on the basis of an auxiliary dataset. The use of abstracts in this approach allows to reduce the number of words sequences extracted with patterns, as compared to the use of full texts. It allows to simplify or totally omit the ranking stage. Ranking is usually needed, because out of many keyphrases candidates we have to choose only 10-15. This stage is the most difficult and its effectiveness depends on the number of the selected candidates to keyphrases. The use of abstracts makes it possible to considerably reduce the number of candidate phrases and at the same time yields high recall.

show abstract

“…to be part of a noun phrase, as was the case in the study by Barker and Cornacchia [5]. You et al [124] used the so-called core word expansion algorithm, which first finds a set of core words and the final set of candidate phrases are generated from these seed phrases. They claimed that their method might reduce the candidate set by about 75%.…”

Section: Generation Of Keyphrase Candidatesmentioning

confidence: 99%

Machine Learning-based Extraction of Keyphrases and its Applications in Multiple Domains

Berend¹

View full text Add to dashboard Cite

ii Preface Raw data of any form conveys no information unless it is processed in some intelligent way. Knowing the most important phrases of textual documents can provide a condensed representation of them which can considerably ease their processing. However, the manual determination of the sets of important phrases for every single document in a large collection of documents is a tedious and expensive task and it often requires expert knowledge. Natural language processing techniques -mostly relying on machine learning -can fortunately help the automatic generation of keyphrases for documents.In this thesis, various models for the extraction of keyphrases from textual documents of various genres and languages are presented, and their potential end-application utilization is demonstrated in the form of a document visualization system. Although most of the earlier studies focused on the domain of scientific papers, we will introduce models for the extraction of keyphrases in two languages (i.e. English and Hungarian) and from various genres including scientific publications, news articles and product reviews as well.

show abstract

An automatic keyphrase extraction system for scientific documents

Cited by 45 publications

References 26 publications

sCAKE: Semantic Connectivity Aware Keyword Extraction

sCAKE: Semantic Connectivity Aware Keyword Extraction

Keyphrase Extraction Abstracts Instead of Full Papers

Machine Learning-based Extraction of Keyphrases and its Applications in Multiple Domains

Contact Info

Product

Resources

About