2010
DOI: 10.1177/0165551510388080
|View full text |Cite
|
Sign up to set email alerts
|

A citation-based approach to automatic topical indexing of scientific literature

Abstract: Topical indexing of documents with keyphrases is a common method used for revealing the subject of scientific and research documents to both human readers and information retrieval tools, such as search engines. However, scientific documents that are manually indexed with keyphrases are still in the minority. This article describes a new unsupervised method for automatic keyphrase extraction from scientific documents which yields a performance on a par with human indexers. The method is based on identifying re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
6
3
1

Relationship

4
6

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 23 publications
0
15
0
Order By: Relevance
“…The overall results of these three rounds, as shown in Table 1, indicate underfitting in case of the second round and overfitting in case of the third round, both resulting in an underperforming model. Table 3 compares the performance of our machine annotator on the wiki-20 dataset with human annotators, a baseline machine annotator based on TFIDF, two un-supervised machine annotators: the work of Grineva et al [4], and CKE [5], and two supervised machine annotators: KEA++ (KEA-5.0) [13,17] and Maui [17]. The supervised and unsupervised machine annotators with the highest performance results appear in bold in the table.…”
Section: Experimental Results and Evaluationmentioning
confidence: 99%
“…The overall results of these three rounds, as shown in Table 1, indicate underfitting in case of the second round and overfitting in case of the third round, both resulting in an underperforming model. Table 3 compares the performance of our machine annotator on the wiki-20 dataset with human annotators, a baseline machine annotator based on TFIDF, two un-supervised machine annotators: the work of Grineva et al [4], and CKE [5], and two supervised machine annotators: KEA++ (KEA-5.0) [13,17] and Maui [17]. The supervised and unsupervised machine annotators with the highest performance results appear in bold in the table.…”
Section: Experimental Results and Evaluationmentioning
confidence: 99%
“…Figure 5 shows the Google Word Cloud (GWC) for a book titled: "Data mining: practical machine learning tools and techniques". The majority of these key terms are domain-specific, semantically rich, and directly related to the core subject of the book, and we have already proved their application in automatic keyphrase extraction from scientific documents [20]. These key terms could be used to measure the relevance of a publication, which cites either the document to be classified or one of its references, to the document.…”
Section: Discussionmentioning
confidence: 99%
“…This feature reflects the observation that in a considerable number of cases the tag's equivalent concept may reappear at the end of the tag's wiki page in form of hyperlinks to external information sources. This feature is proven to be effective in similar applications, where the candidate concepts occurring close to the end of a document, e.g., conclusion and reference sections, are shown to be probabilistically more significant [47,50,51]. (4) Spread: the distance between the first and last occurrences of the candidate concept, measured in terms of the number of characters and normalized by the length of the wiki page.…”
Section: Features For Wikipedia Conceptsmentioning
confidence: 99%