2009
DOI: 10.1007/978-3-642-04417-5_32
|View full text |Cite
|
Sign up to set email alerts
|

Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

Abstract: Abstract. In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and non-constrained algorithms for the given task.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0
2

Year Published

2010
2010
2012
2012

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 13 publications
0
11
0
2
Order By: Relevance
“…Firstly, we survey normalised cut, a very effective spectral clustering algorithm introduced by Shi and Malik in [5], and its constrained counterpart, constrained normalised cut, introduced by Ji et al in [4]. Afterwards, we outline soft constrained k-means, a constrained clustering algorithm based on k-means introduced by Ares et al in [3].…”
Section: Clustering Algorithmsmentioning
confidence: 99%
See 4 more Smart Citations
“…Firstly, we survey normalised cut, a very effective spectral clustering algorithm introduced by Shi and Malik in [5], and its constrained counterpart, constrained normalised cut, introduced by Ji et al in [4]. Afterwards, we outline soft constrained k-means, a constrained clustering algorithm based on k-means introduced by Ares et al in [3].…”
Section: Clustering Algorithmsmentioning
confidence: 99%
“…In order to address these limitations, Ares et al introduced in [3] two kinds of non absolute constraints: May-Links and May-Not-Links, which indicate that two documents are, respectively, likely or not likely to be in the same cluster. The implementation of these constraints alters again the assignment process of the documents.…”
Section: Soft Constrained K-meansmentioning
confidence: 99%
See 3 more Smart Citations