2006
DOI: 10.1007/s10994-006-6540-7
|View full text |Cite
|
Sign up to set email alerts
|

Semi-supervised model-based document clustering: A comparative study

Abstract: Semi-supervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semisupervised clustering. Viewing semi-supervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(37 citation statements)
references
References 23 publications
(25 reference statements)
0
34
0
Order By: Relevance
“…Cluster seeds are derived from the constraints to initialize the cluster centroids [7], [9]. In [11], a comparative study of investigating annealing process for varies model-based semi-supervised document clustering approaches with labeled documents are presented. Recently, Yan et al [12] investigated a semi-supervised fuzzy co-clustering approach.…”
Section: A Semi-supervised Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…Cluster seeds are derived from the constraints to initialize the cluster centroids [7], [9]. In [11], a comparative study of investigating annealing process for varies model-based semi-supervised document clustering approaches with labeled documents are presented. Recently, Yan et al [12] investigated a semi-supervised fuzzy co-clustering approach.…”
Section: A Semi-supervised Clusteringmentioning
confidence: 99%
“…Experimental Performance. For comparative investigation, two state-of-the-art semi-supervised document clustering approaches [11], [12] that use labeled documents as supervised information were investigated, labeled as constrained-DAMNL and SS-HFCR respectively, Fig. 7 shows the experimental performances of our proposed LLDA model, the constrained-DAMNL, and the SS-HFCR model on the re0 and the Yahoo_k1 datasets.…”
Section: Real Document Datasetsmentioning
confidence: 99%
“…If clusters consist of documents on the same topic, each topic in the fixed categorization corresponds to one cluster in an absolute partition. This type of document clustering is the absolute type [20]. In the second case, there are multiple kinds of topic categorizations, and documents are clustered based on one of the categorizations that would best appropriately summarize the document set.…”
Section: Further Examples Of Absolute and Relative Clustering Tasksmentioning
confidence: 99%
“…Various types of problem formalizations as well as algorithms have been proposed as methods of supervised clustering. For example, users represent their preferences for grouping structures by labels [20]. Data points that are assigned the same label are grouped into the same cluster, and data with different labels are separated into different clusters.…”
Section: Introductionmentioning
confidence: 99%
“…Many semi-supervised algorithms have been proposed (Zhong, 2006) including co-training (Blum and Mitchell, 1998), the transductive support vector machine (Joachims, 1999), entropy minimization (Guerrero-Curieses and Cid-Sueiro, 2000), semi-supervised Expectation Maximization (Nigam et al, 2000), graph-based approaches (Blum and Chawla, 2001;Zhu et al, 2003), and clustering-based approaches (Zeng et al, 2003).…”
Section: Related Workmentioning
confidence: 99%