Labeling clusters from both linguistic and statistical perspectives: A hybrid approach

Li, Zhixing; Li, Juanzi; Liao, Yi; Wen, Siqiang; Tang, Jie

doi:10.1016/j.knosys.2014.12.019

Cited by 9 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Pemilihan frasa kandidat dapat menggunakan pendekatan statistik, berbasis graf, atau klasterisasi topik. Pemilihan frasa dengan pendekatan statistik antara lain dengan pembobotan Term Frequency -Inverse Cluster Frequency (TF-ICF) [27], menggunakan perhitungan Markov Chain [11], dan pemberian nilai frasa kandidat berdasarkan Pointwise Mutual Information (PMI) [28]. Pemilihan frasa kandidat berbasis graf sebagai representasi teks seperti TextRank [23]…”

Section: Pelabelan Klaster Dan Klasterisasi Topikunclassified

See 1 more Smart Citation

Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Nurilham¹,

Purwitasari²,

Fatichah³

2018

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

View full text Add to dashboard Cite

Document clustering based on topic similarities helps users in searching from a collection of scientific articles. Topic labels are necessesary for describing subjects of the document clusters. Clusters with related subjects or contextual similarities can be merged to produce more descriptive labels. Relations between those words in one context can be modelled as a graph. Instead of single word, this paper proposed cluster labeling of phrases from scientific articles with cluster merging based on graph. The proposed method begins with K-Means++ for clustering the scientific articles. Then, the candidates of word phrases from document clusters are extracted using Frequent Phrase Mining which inspired by Apriori algorithm. Each cluster result has a representation graph from those extracted word phrases. An indicator value from each graph shows any similarities of graph structures which is calculated with Maximum Common Subgraph (MCS). Those clusters are merged if there are any structure similarities between them. Topic labels of clusters are keyword phrases extracted from a representation graph of previous merged clusters using TopicRank algorithm. The merging process which becomes the contribution of this paper is considering topic distribution within clusters for phrase extraction. The proposed method evaluation is performed based on topic coherence of the merged clusters label. The results show that proposed method can improve topic coherence on the merged clusters with MCS graph size percentage as the key factor. Further observation shows that merged cluster labels consistent to MCS graph.

show abstract

Section: Pelabelan Klaster Dan Klasterisasi Topikunclassified

“…Kata tunggal sebagai label klaster dianggap kurang intuitif sehingga frasa kata diutamakan karena lebih deskriptif bagi representasi topik dengan gabungan pendekatan linguistik serta statistik [11]. Klasterisasi memberikan hasil kurang optimal jika beberapa kelompok dokumen masih memiliki kemiripan kontekstual seperti sinonim, polisemi, atau ambiguitas [12].…”

unclassified

Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Nurilham¹,

Purwitasari²,

Fatichah³

2018

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI)

View full text Add to dashboard Cite

show abstract

“…Their results labelled clusters with an average above 88.79% of elements correctly. Li et al (2015) developed a combine approach of both linguistic and statistical perspectives to label the clusters. Performance of their approach is evaluated on 20-Newsgroups (English) and NewsMiner (Chinese) datasets.…”

Section: Literature Surveymentioning

confidence: 99%

Cluster labelling using chi-square-based keyword ranking and mutual information score: a hybrid approach

Roul

Sahay

2017

IJISDC

View full text Add to dashboard Cite

Cluster labelling is a technique which provides useful information about the cluster to the end users. In this paper, we propose a novel approach which is the follow-up of our previous work. Our earlier approach generates clusters of web documents by using a modified apriori approach which is more efficient and faster than the traditional apriori approach. To label the clusters, the propose approach used an effective feature selection technique which selects the top features of a cluster. Rather than labelling the cluster with 'bag of words', a concept driven mechanism has been developed which uses the Wikipedia that takes the top features of a cluster as input to generate the possible candidate labels. Mutual information (MI) score technique has been used for ranking the candidate labels and then the topmost candidates are considered as potential labels of a cluster. Experimental results on two benchmark datasets demonstrate the efficiency of our approach.Keywords: candidate label; chi-square; keyword ranking; mutual information; Wikipedia.Reference to this paper should be made as follows: Roul, R.K. and Sahay, S.K. (2017) 'Cluster labelling using chi-square-based keyword ranking and mutual information score: a hybrid approach', Int.

show abstract

“…The problem of cluster labeling has been subject to different interesting researches in the literature [12,4,[15][16][17]. These researches have explored different techniques to achieve cluster labeling.…”

Section: Related Workmentioning

confidence: 99%

“…The presented algorithm assigns few labels to the clusters based on the cluster analysis information, the parent cluster and statistics about the corpus. Recently, Li et al [12] proposed an hybrid approach, combining linguistic and statistical techniques to achieve an automated labeling of the clusters. Although these approaches are very interesting, they are all dedicated to textual data and cannot work on quantitative data.…”

Section: Related Workmentioning

confidence: 99%

Towards Ontology Reasoning for Topological Cluster Labeling

Chahdi

Grozavu

Mougenot

et al. 2016

Neural Information Processing

View full text Add to dashboard Cite

Abstract. In this paper, we present a new approach combining topological unsupervised learning with ontology based reasoning to achieve both : (i) automatic interpretation of clustering, and (ii) scaling ontology reasoning over large datasets. The interest of such approach holds on the use of expert knowledge to automate cluster labeling and gives them high level semantics that meets the user interest. The proposed approach is based on two steps. The first step performs a topographic unsupervised learning based on the SOM (Self-Organizing Maps) algorithm. The second step integrates expert knowledge in the map using ontology reasoning over the prototypes and provides an automatic interpretation of the clusters. We apply our approach to the real problem of satellite image classification. The experiments highlight the capacity of our approach to obtain a semantically labeled topographic map and the obtained results show very promising performances.

show abstract

Labeling clusters from both linguistic and statistical perspectives: A hybrid approach

Cited by 9 publications

References 15 publications

Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Ekstraksi Frasa Kunci pada Penggabungan Klaster berdasarkan Maximum-Common-Subgraph

Cluster labelling using chi-square-based keyword ranking and mutual information score: a hybrid approach

Towards Ontology Reasoning for Topological Cluster Labeling

Contact Info

Product

Resources

About