Moderate diversity for better cluster ensembles

Hadjitodorov, Stefan; Kuncheva, Ludmila I.; Todorova, Lyudmila

doi:10.1016/j.inffus.2005.01.008

Cited by 188 publications

(127 citation statements)

References 25 publications

Supporting

Mentioning

110

Contrasting

Unclassified

Order By: Relevance

“…Several studies focused on understanding how diversity was handled on various ensemble creation techniques like AdaBoost or Bagging [11,12]. Finally, many techniques have been proposed for exploiting diversity for finding good ensembles [13][14][15][16][17][18]. It was even proposed to voluntarily overtrain the classifiers in order to create diversity between them [19].…”

Section: Diversity In Ensembles Of Classifiersmentioning

confidence: 99%

See 1 more Smart Citation

Information theoretic combination of pattern classifiers

Meynet¹,

Thiran

2010

Pattern Recognition

View full text Add to dashboard Cite

Section: Diversity In Ensembles Of Classifiersmentioning

confidence: 99%

“…Kuncheva also reported in [6] that the improvement on the best individual accuracy by forcing diversity is negligible. In [14], Hadjitodorov showed that, in some particular cases, using moderate diversity can produce better ensemble than maximum measure of diversity.…”

Section: Limits Of Diversity Measuresmentioning

confidence: 99%

Information theoretic combination of pattern classifiers

Meynet¹,

Thiran

2010

Pattern Recognition

View full text Add to dashboard Cite

“…In this section we introduce the pairwise similarity matrices between examples [28,45], since they are used to compute the stability measures proposed in this paper. The similarity matrix is a sort of distributed memory of the clusters, by which memberships of pairs of examples to the same cluster are stored.…”

Section: Similarity Matrixmentioning

confidence: 99%

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Bertoni

Valentini

2006

Artificial Intelligence in Medicine

View full text Add to dashboard Cite

Objective: Clustering algorithms may be applied to the analysis of DNA microarray data to identify novel subgroups that may lead to new taxonomies of diseases defined at bio-molecular level. A major problem related to the identification of biologically meaningful clusters is the assessment of their reliability, since clustering algorithms may find clusters even if no structure is present. Methodology:Recently, methods based on random "perturbations" of the data, such as bootstrapping, noise injections techniques and random subspace methods have been applied to the problem of cluster validity estimation. In this framework, we propose stability measures that exploits the high dimensionality of DNA microarray data and the redundancy of information stored in microarray chips. To this end we randomly project the original gene expression data into lower dimensional subspaces, approximately preserving the distance between the examples according to the Johnson-Lindenstrauss (JL) theory. The stability of the clusters discovered in the original high dimensional space is estimated by comparing them with the clusters discovered in randomly projected lower dimensional subspaces. The proposed cluster-stability measures may be applied to validate and to quantitatively assess the reliability of the clusters obtained by a large class of clustering algorithms. Results and conclusion:We tested the effectiveness of our approach with high dimensional synthetic data, whose distribution is a priori known, showing that the stability measures based on randomized maps correctly predict the number of clusters and the reliability of each individual cluster. Then we showed how to apply the proposed measures to the analysis of DNA microarray data, whose underlying distribution is unknown. We evaluated the validity of clusters discovered by hierarchical clustering algorithms in diffuse large B-cell lymphoma (DLBCL) and malignant melanoma patients, showing that the proposed reliability measures can support bio-medical researchers in the identification of stable clusters of patients and in the discovery of new subtypes of diseases characterized at bio-molecular level.

show abstract

“…There are two factors that influence the performance of this approach: one is the accuracy of the individual clusters (P i ) and the other is the diversity within the ensemble E. Accuracy is maintained by tuning a set of effective clustering methods to obtain the best set of results. Regarding the diversity of E, it was shown in [15] that a moderate level of dissimilarity among the ensemble members (E) improves the consensus results. For this, we studied the diversity within E, using the Rand Index (RI) similarity measure [16], and created a more effective sub-set of cluster solutions to represent the new ensemble, denoted here as E .…”

Section: Consensus Clustering (Cc) Frameworkmentioning

confidence: 99%

Unsupervised Superpixel-Based Segmentation of Histopathological Images with Consensus Clustering

Fouad

Randell

Galton

et al. 2017

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract. We present a framework for adapting consensus clustering methods with superpixels to segment oropharyngeal cancer images into tissue types (epithelium, stroma and background). The simple linear iterative clustering algorithm is initially used to split-up the image into binary superpixels which are then used as clustering elements. Colour features of the superpixels are extracted and fed into several base clustering approaches with various parameter initializations. Two consensus clustering formulations are then used, the Evidence Accumulation Clustering (EAC) and the voting-based function. They both combine the base clustering outcomes to obtain a single more robust consensus result. Unlike most unsupervised tissue image segmentation approaches that depend on individual clustering methods, the proposed approach allows for a robust detection of tissue compartments. For the voting-based consensus function, we introduce a technique based on image processing to generate a consistent labelling scheme among the base clustering outcomes. Experiments conducted on forty five hand-annotated images of oropharyngeal cancer tissue microarray cores show that the ensemble algorithm generates more accurate and stable results than individual clustering algorithms. The clustering performance of the voting-based consensus function using our re-labelling technique also outperforms the existing EAC.

show abstract

Moderate diversity for better cluster ensembles

Cited by 188 publications

References 25 publications

Information theoretic combination of pattern classifiers

Information theoretic combination of pattern classifiers

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Unsupervised Superpixel-Based Segmentation of Histopathological Images with Consensus Clustering

Contact Info

Product

Resources

About