An active learning framework for semi-supervised document clustering with language modeling

Huang, Ruizhang; Lam, Wai

doi:10.1016/j.datak.2008.08.008

Cited by 47 publications

(23 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) Incrementally updating the cluster tree: when the number of documents increases sequentially in a document set, it is inefficient to reform the cluster tree for each new insertion. That is, it is admirable to reflect the current state of the whole document set by incrementally updating the cluster tree [28][29][30]. Therefore, we intend to propose an efficient incremental clustering algorithm for assigning a new document to the most similar existing cluster in the future.…”

Section: Discussionmentioning

confidence: 99%

An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Chen

Tseng²,

Liang

2010

Data & Knowledge Engineering

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Chen

Tseng²,

Liang

2010

Data & Knowledge Engineering

View full text Add to dashboard Cite

“…Nogueira et al introduce a new active semi-supervised hierarchical clustering method [13]. This strategy uses not only cluster-level constraints [9] where the user can indicate a pair of clusters to be merged but also an innovative concept called confidence. When there is lower confidence in a cluster merge the user can be queried and provide a cluster-level constraint.…”

Section: Related Workmentioning

confidence: 98%

$\mathcal{SHACUN}$ : Semi-supervised Hierarchical Active Clustering Based on Ranking Constraints

Ahmed

Nabli

Gargouri

2012

Advances in Data Mining. Applications and Theoretical Aspects

View full text Add to dashboard Cite

Abstract. Semi-supervised approaches have proven to be efficient in clustering tasks. They allow user input, thus enhancing the quality of the clustering. However, the user intervention is generally limited to integrate boolean constraints in form of must-link and cannot-link constraints between pairs of objects. This paper investigates the issue of satisfying ranked constraints in performing hierarchical clustering. SHACUN is a new introduced method for handling cases when some constraints are more important than others and must be firstly enforced. Carried out experiments on real log files used for decision-maker groupization in data warehouse confirm the soundness of our approach.

show abstract

“…In this paper, the user is asked to give feedback at the feature level instead of the document level. Except active learning of document constraints such as [15], most semi-supervised clustering algorithms involve the user supervision outside the clustering process. In this way, all the document constraints are defined before the clustering starts.…”

Section: Related Workmentioning

confidence: 99%

Interactive document clustering with feature supervision through reweighting1

Milios

Blustein

2014

IDA

View full text Add to dashboard Cite

Unsupervised document clustering groups documents into clusters without any user effort. However, the clusters produced are often found not in accord with user's perception of the document collection. In this paper we describe a novel framework and explore whether clustering performance can be improved by including user supervision at the feature level. Unlike existing semi-supervised clustering methods, which ask the user to label documents, this framework interactively asks the user to label features. The proposed method ranks all features based on the recent clusters using cluster-based feature selection and presents a list of highly ranked features to the user for labeling. The feature set for the next clustering iteration includes both features accepted by the user and other highly ranked features. The experimental results on several real datasets demonstrate that the feature set obtained using the new interactive framework can produce clusters that better match the user's expectations compared with the unsupervised version of the methods. Moreover, we quantify and evaluate the effect of reweighting previously accepted features and of user effort. Different underlying clustering algorithms such as K Means and Multinomial Naïve Bayes model are demonstrated to perform very well with the newly proposed framework.

show abstract

An active learning framework for semi-supervised document clustering with language modeling

Cited by 47 publications

References 15 publications

An integration of WordNet and fuzzy association rule mining for multi-label document clustering

An integration of WordNet and fuzzy association rule mining for multi-label document clustering

$\mathcal{SHACUN}$ : Semi-supervised Hierarchical Active Clustering Based on Ranking Constraints

Interactive document clustering with feature supervision through reweighting1

Contact Info

Product

Resources

About