Semi-supervised model-based document clustering: A comparative study

Su, Zhong

doi:10.1007/s10994-006-6540-7

Cited by 60 publications

(37 citation statements)

References 23 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cluster seeds are derived from the constraints to initialize the cluster centroids [7], [9]. In [11], a comparative study of investigating annealing process for varies model-based semi-supervised document clustering approaches with labeled documents are presented. Recently, Yan et al [12] investigated a semi-supervised fuzzy co-clustering approach.…”

Section: A Semi-supervised Clusteringmentioning

confidence: 99%

“…Experimental Performance. For comparative investigation, two state-of-the-art semi-supervised document clustering approaches [11], [12] that use labeled documents as supervised information were investigated, labeled as constrained-DAMNL and SS-HFCR respectively, Fig. 7 shows the experimental performances of our proposed LLDA model, the constrained-DAMNL, and the SS-HFCR model on the re0 and the Yahoo_k1 datasets.…”

Section: Real Document Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

A LDA-Based Approach for Semi-Supervised Document Clustering

Huang¹,

Zhou²,

Zhang³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-In this paper, we develop an approach for semi-supervised document clustering based on Latent Dirichlet Allocation (LDA), namely LLDA. A small amount of labeled documents are used to indicate user's document grouping preference. A generative model is investigated to jointly model documents and the small amount of document labels. A variational inference algorithm is developed to infer the document collection structure. We explore the performance of our proposed approach on both a synthetic dataset and realistic document datasets. Our experiments indicate that our proposed approach performs well on grouping documents based on different user grouping preferences. The comparison between our proposed approach and state-of-the-art semi-supervised clustering algorithms using labeled instance shows that our approach is effective.Index Terms-Semi-supervised clustering, document clustering, latent dirichlet allocation, generative model.

show abstract

Section: A Semi-supervised Clusteringmentioning

confidence: 99%

Section: Real Document Datasetsmentioning

confidence: 99%

A LDA-Based Approach for Semi-Supervised Document Clustering

Huang¹,

Zhou²,

Zhang³

2014

IJMLC

View full text Add to dashboard Cite

show abstract

“…If clusters consist of documents on the same topic, each topic in the fixed categorization corresponds to one cluster in an absolute partition. This type of document clustering is the absolute type [20]. In the second case, there are multiple kinds of topic categorizations, and documents are clustered based on one of the categorizations that would best appropriately summarize the document set.…”

Section: Further Examples Of Absolute and Relative Clustering Tasksmentioning

confidence: 99%

“…Various types of problem formalizations as well as algorithms have been proposed as methods of supervised clustering. For example, users represent their preferences for grouping structures by labels [20]. Data points that are assigned the same label are grouped into the same cluster, and data with different labels are separated into different clusters.…”

Section: Introductionmentioning

confidence: 99%

Absolute and relative clustering

Kobayashi

Akaho

2013

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-View Data, and Multi-Source Knowledge-Driven Clusteri

View full text Add to dashboard Cite

Research into (semi-)supervised clustering has been increasing. Supervised clustering aims to group similar data that are partially guided by the user's supervision. In this supervised clustering, there are many choices for formalization. For example, as a type of supervision, one can adopt labels of data points, must/cannot links, and so on. Given a real clustering task, such as grouping documents or image segmentation, users must confront the question "How should we mathematically formalize our task?" To help answer this question, we propose the classification of real clusterings into absolute and relative clusterings, which are defined based on the relationship between the resultant partition and the data set to be clustered. This categorization can be exploited to choose a type of task formalization.

show abstract

“…Many semi-supervised algorithms have been proposed (Zhong, 2006) including co-training (Blum and Mitchell, 1998), the transductive support vector machine (Joachims, 1999), entropy minimization (Guerrero-Curieses and Cid-Sueiro, 2000), semi-supervised Expectation Maximization (Nigam et al, 2000), graph-based approaches (Blum and Chawla, 2001;Zhu et al, 2003), and clustering-based approaches (Zeng et al, 2003).…”

Section: Related Workmentioning

confidence: 99%

A Semi-supervised Learning Framework to Cluster Mixed Data Types

Abdullin

Nasraoui

2012

Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

View full text Add to dashboard Cite

Abstract:We propose a semi-supervised framework to handle diverse data formats or data with mixedtype attributes. Our preliminary results in clustering data with mixed numerical and categorical attributes show that the proposed semi-supervised framework gives better clustering results in the categorical domain. Thus the seeds obtained from clustering the numerical domain give an additional knowledge to the categorical clustering algorithm. Additional results show that our approach has the potential to outperform clustering either domain on its own or clustering both domains after converting them to the same target domain.

show abstract

Semi-supervised model-based document clustering: A comparative study

Cited by 60 publications

References 23 publications

A LDA-Based Approach for Semi-Supervised Document Clustering

A LDA-Based Approach for Semi-Supervised Document Clustering

Absolute and relative clustering

A Semi-supervised Learning Framework to Cluster Mixed Data Types

Contact Info

Product

Resources

About