Semi Supervised Graph Based Keyword Extraction Using Lexical Chains and Centrality Measures

Aggarwal, Ayush; Sharma, C.B.; Jain, Minni; Jain, Amita

doi:10.13053/cys-22-4-3077

Cited by 7 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lexical chains describe sets of semantically related words. Lexical chains can be created using three steps: (1) select a set of candidate words, (2) determine a suitable chain by calculating the semantic relatedness among members of the chain, and (3) if a chain exists, add the word and update the chain; else, create a new chain to fit the word [54,55]. The second step can be performed using an existing database of synsets, such as the one included in the WordNet corpus [56].…”

Section: Keyword Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Word synonym relationships for text analysis: A graph-based approach

Alrasheed

2021

PLoS ONE

View full text Add to dashboard Cite

Keyword extraction refers to the process of detecting the most relevant terms and expressions in a given text in a timely manner. In the information explosion era, keyword extraction has attracted increasing attention. The importance of keyword extraction in text summarization, text comparisons, and document categorization has led to an emphasis on graph-based keyword extraction techniques because they can capture more structural information compared to other classic text analysis methods. In this paper, we propose a simple unsupervised text mining approach that aims to extract a set of keywords from a given text and analyze its topic diversity using graph analysis tools. Initially, the text is represented as a directed graph using synonym relationships. Then, community detection and other measures are used to identify keywords in the text. The set of extracted keywords is used to assess topic diversity within the text and analyze its sentiment. The proposed approach relies on grouping semantically similar candidate words. This approach ensures that the set of extracted keywords is comprehensive. Differing from other graph-based keyword extraction approaches, the proposed method does not require user parameters during graph construction and word scoring. The proposed approach achieved significant results compared to other keyword extraction techniques.

show abstract

Section: Keyword Extractionmentioning

confidence: 99%

“…The second step can be performed using an existing database of synsets, such as the one included in the WordNet corpus [56]. Lexical chains and graph centrality measures were also used for keyword extraction in [55,57].…”

Section: Keyword Extractionmentioning

confidence: 99%

Word synonym relationships for text analysis: A graph-based approach

Alrasheed

2021

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…Various subsequent approaches use variants of term occurrence measures with probabilities, such as χ2-test, log likelihood (Dunning, 1993) and mutual information (Church and Hanks, 1990), or attempt to combine statistical measures with various types of linguistic and stop-word filters, so as to refine the keyword results. Considerations regarding term ambiguity and variation also led to rule-based approaches (Jacquemin, 2001) and resource-based approaches exploiting existing thesauri and lexica, such as UMLS (Hliaoutakis et al, 2009), or Word-Net (Aggarwal et al, 2018). Knowledge poor statistical approaches, such as Latent Semantic Analysis (Deerwester et al, 1990) and Latent Dirichlet Allocation (Blei et al, 2003) attempt to detect document content in an unsupervised manner while reducing the dimensionality of the feature space of other bag-of-word approaches, but are also sensitive to sparse data and variation in short texts.…”

Section: Related Workmentioning

confidence: 99%

Term Based Semantic Clusters for Very Short Text Classification

Paalman¹,

Mullick²,

Zervanou³

et al. 2019

Proceedings - Natural Language Processing in a Deep Learning World

View full text Add to dashboard Cite

Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.

show abstract