Document keyphrases as subject metadata: incorporating document key concepts in search results

Wu, Yi-Ming; Li, Quanzhi

doi:10.1007/s10791-008-9044-1

Cited by 24 publications

(17 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is because despite the large overlap between MA and VK, there is not overlap between MA and HA 1 . Whereas when additional gold standards, HA 2 and HA 3 , are available and used for evaluating the overall inter-indexer consistency score of the machine annotator, a more accurate estimation of the machine annotator's performance could be achieved. In fact, as can be seen in the illustration, if the quality of MA and HA 1 were to be compared with each other by using HA 2 and HA 3 as gold standards, the overall quality of MA would be significantly higher than HA 1 Whereas, in case of the former, additional sets of keyphrases are usually not available and need to be created manually, for example the small wiki-20 dataset used in this work has taken ninety man-hours to create.…”

Section: Experimental Results and Evaluationmentioning

confidence: 99%

“…Annotating scientific documents with keyphrases as subject/topical metadata helps both humans and information retrieval systems to focus their search and discovery efforts on the most relevant items of interest and reduces the recall effort (i.e., ratio of desired to examined) [2,3]. However, despite the fact that authors of scientific literature, especially those published in journals and conference proceedings, are encouraged and often required by editors to provide a list of keyphrases, scientific documents with manually assigned keyphrases by either authors or professional annotators are still in the minority.…”

Section: Introductionmentioning

confidence: 99%

“…Nevertheless, we believe deploying these two approaches together would improve the overall performance of keyphrase indexing, as they estimate the semantic relatedness of topics very differently using two independent information sources in Wikipedia and therefore could complement each other. We measure the category-based relatedness of two Wikipedia topics as: (2) where D is the maximum depth of the taxonomy, i.e., 16 in case of the Wikipedia dump used in this work. The distance function returns the length of the shortest path between topic 1 and topic 2 in terms of the number of nodes along the path.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Joorabchi

Mahdi

2013

Journal of Information Science

View full text Add to dashboard Cite

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents which utilizes Wikipedia as a thesaurus for candidate selection from documents' content. We have devised a set of twenty statistical, positional, and semantical features for candidate phrases to capture and reflect various properties of those candidates which have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.

show abstract

Section: Experimental Results and Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Joorabchi

Mahdi

2013

Journal of Information Science

View full text Add to dashboard Cite

show abstract

“…We treat the noun phrases in the document as the candidate keyphrases [1]. To identify the noun phrases, documents should be tagged.…”

Section: Noun Phrase Identificationmentioning

confidence: 99%

“…A number of previous works has suggested that document keyphrases can be useful in a various applications such as retrieval engines [1], [2], [3], browsing interfaces [4], thesaurus construction [5], and document classification and clustering [6].…”

Section: Introductionmentioning

confidence: 99%

Automatic Keyphrase Extraction from Medical Documents

Sarkar

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Keyphrases provide semantic metadata that summarizes the documents and enable the reader to quickly determine whether the given article is in the reader's fields of interest. This paper presents an automatic keyphrase extraction method based on the naive Bayesian learning that exploits a number of domain-specific features to boost up the keyphrase extraction performance in medical domain. The proposed method has been compared to a popular keyphrase extraction algorithm, called Kea.

show abstract

Automatic keyphrase extraction and ontology mining for content-based tag recommendation

Pudota

Dattolo

Baruzzo

et al. 2010

Int. J. Intell. Syst.

View full text Add to dashboard Cite

Collaborative tagging represents for the Web a potential way for organizing and sharing information and for heightening the capabilities of existing search engines. However, because of the lack of automatic methodologies for generating the tags and supporting the tagging activity, many resources on the Web are deficient in tag information, and recommending opportune tags is both a current open issue and an exciting challenge. This paper approaches the problem by applying a combined set of techniques and tools (that uses tags, domain ontologies, keyphrase extraction methods) thereby generating tags automatically. The proposed approach is implemented in the PIRATES (Personalized Intelligent tag Recommender and Annotator TEStbed) framework, a prototype system for personalized content retrieval, annotation, and classification. A case study application is developed using a domain ontology for software engineering

show abstract

Document keyphrases as subject metadata: incorporating document key concepts in search results

Cited by 24 publications

References 22 publications

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms

Automatic Keyphrase Extraction from Medical Documents

Automatic keyphrase extraction and ontology mining for content-based tag recommendation

Contact Info

Product

Resources

About