A study on automatically extracted keywords in text categorization

Hulth, Anette; Megyesi, Beáta

doi:10.3115/1220175.1220243

Cited by 129 publications

(75 citation statements)

References 13 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In other words, its goal is to extract a set of phrases that are related to the main topics discussed in a given document (Tomokiyo and Hurst, 2003;Liu et al, 2009b;Ding et al, 2011;Zhao et al, 2011). Document keyphrases have enabled fast and accurate searching for a given document from a large text collection, and have exhibited their potential in improving many natural language processing (NLP) and information retrieval (IR) tasks, such as text summarization (Zhang et al, 2004), text categorization (Hulth and Megyesi, 2006), opinion mining (Berend, 2011), and document indexing .…”

Section: Introductionmentioning

confidence: 99%

Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan¹,

Ng²

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

382

188

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan¹,

Ng²

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

382

188

View full text Add to dashboard Cite

show abstract

“…To solve this labeling issue, some studies, based on text summary [17] and key phrase extractionapproaches [24]identify text portions or key phrases according to their major theme [8]. Other methods focus on the identification of text portions related to the document title [29].…”

Section: Related Workmentioning

confidence: 99%

How Ontology Based Information Retrieval Systems May Benefit from Lexical Text Analysis

Ranwez

Duthil

et al. 2012

New Trends of Research in Ontologies and Lexical Resources

View full text Add to dashboard Cite

“…We then use these new terms directly, or broken down into single terms (in case of multiword terms). This last feature is motivated by [10], who showed improved document classification results after breaking down multiwords for partial matches. In summary, we use the following four types of semantic features:…”

Section: Semantic Informationmentioning

confidence: 99%

Automatic classification of sentences for evidence based medicine

Kim

Martínez

Cavedon

2010

Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics

View full text Add to dashboard Cite

AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical abstracts annotated by hand with medical categories (e.g. "Intervention", "Outcome"). We explore the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULT For the classification tasks over all labels, our systems achieved micro-averaged F-scores of 80.9% and 66.9% in structured and unstructured datasets respectively, using sequential features. In labeling only key sentences, our systems produced F-scores of 89.3% and 74.0% in structured and unstructured datasets respectively, using the same sequential features. The results over an external dataset were lower (F-scores of 63.1% for alllabels, and 83.8% for key sentences). CONCLUSION Of the features we used, the best for classifying any given sentence in an abstract are based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperform feature sets used in previous work.

show abstract

A study on automatically extracted keywords in text categorization

Cited by 129 publications

References 13 publications

Automatic Keyphrase Extraction: A Survey of the State of the Art

Automatic Keyphrase Extraction: A Survey of the State of the Art

How Ontology Based Information Retrieval Systems May Benefit from Lexical Text Analysis

Automatic classification of sentences for evidence based medicine

Contact Info

Product

Resources

About