Towards a standard methodology to evaluate internal cluster validity indices

Gurrutxaga, Ibai; Muguerza, Javier; Arbelaitz, Olatz; Pérez, Jesús M.; Martín, Jesús Izquierdo

doi:10.1016/j.patrec.2010.11.006

Cited by 57 publications

(28 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, the smaller their values the better is the solution. This is exactly what Deviation and Connectivity measures do respectively (Handl and Knowles, 2007;Hruschka et al, 2009;Gurrutxaga et al, 2011).…”

Section: Deviation and Connectivitysupporting

confidence: 54%

“…These facts often hinders the data analysis step because experts must evaluate all the different solutions generated by the algorithm, which is highly time consuming and quite arbitrary because the selection will depend on the subjectivity of the expert due to the fact that all of them are potentially valid. For this reason, the application of evaluation functions for automatically scoring the clustering solutions has become the key for helping experts to select the best (Gurrutxaga et al, 2011). These evaluation functions define metrics that measure the cluster quality by using the same features included in the data set.…”

Section: Looking For the Most Suitable Patternsmentioning

confidence: 99%

See 1 more Smart Citation

Promoting consensus in the concept mapping methodology: An application in the hospitality sector

Fornells

Rodrigo

Rovira

et al. 2015

Pattern Recognition Letters

View full text Add to dashboard Cite

The concept mapping methodology aims to respond to the non trivial task of conceptualising abstract thoughts by means of a focus group composed by experts from the studied domain. The approach defines a set of general steps that allow experts to lead the generation of ideas, group the ideas in a conceptual map of interrelated concepts using clustering multidimensional scaling and clustering techniques, analysing the quality of the conceptual maps and deciding on a final interpretation. In this sense, this final decision is not trivial because clustering techniques provide a set of potentially conceptual maps so experts must select the one that fits best according to their opinion. For this reason, we present the global index of consensus as an indicator for filtering the most suitable clustering solutions using qualitative reasoning. It promotes the consensus of experts opinions and ensures objectivity in the final interpretation. The index outperforms three of the most well-known clustering validation indexes in a case study focused on the meaning of excellence in hospitality industry. This work presents the global index of consensus as an indicator for filtering the most suitable clustering solutions using qualitative reasoning that promotes the consensus of experts' opinions, which is one of the key aspects in the concept mapping methodology. The index outperforms three of the most well-known clustering validation indexes in a case study focused on the meaning of excellence in hospitality.

show abstract

Section: Deviation and Connectivitysupporting

confidence: 54%

Section: Looking For the Most Suitable Patternsmentioning

confidence: 99%

Promoting consensus in the concept mapping methodology: An application in the hospitality sector

Fornells

Rodrigo

Rovira

et al. 2015

Pattern Recognition Letters

View full text Add to dashboard Cite

show abstract

“…VIC was tested using 50 different data sets, where it significantly outperforms other well known cluster indexes (Rodríguez et al, 2018). Unlike other internal indexes, which tend to prefer clusters with specific shapes, such as hyperspheres (Lago-Fernández & Corbacho, 2010;Halkidi & Vazirgiannis, 2008;Gurrutxaga et al, 2011), VIC does not assume a specific shape. Similarly, other indexes tend to prefer higher number of clusters (Dubes, 1987), whereas VIC does not.…”

Section: Validity Index Using Supervised Classifiersmentioning

confidence: 99%

Cluster validation in clustering‐based one‐class classification

Rodríguez¹,

Monroy²,

Medina‐Pérez³

et al. 2019

Expert Systems

View full text Add to dashboard Cite

Reconstruction‐based one‐class classification has shown to be very effective in a number of domains. This approach works by attempting to capture the underlying structure of the normal class, typically, by means of clusters of objects. It has the main disadvantage, however, that one has to indicate the number of clusters in advance, for this yields an efficient way of computing a clustering. In this paper, we introduce a new algorithm, OCKRA++, which achieves a better performance, by enhancing a clustering‐based one‐class ensemble classifier (OCKRA) with a cluster validity index that is used to set the best number of clusters during the classifier's training process. We have thoroughly tested OCKRA++ in a particular domain, namely masquerade detection. For this purpose, we have used the Windows‐Users and ‐Intruder simulation Logs data set repository, which contains 70 different masquerade data sets. We have found that OCKRA++ is currently the algorithm that achieves the best area under the curve, with a significant difference, in masquerade detection using the file system navigation approach.

show abstract

“…Recently, it is suggested by [4] that internal evaluation measures should be evaluated with external indices. They used Kmeans to generate different partitions and compared the best partitions by internal measures to the best partitions by external measures (real partitions).…”

Section: Related Workmentioning

confidence: 99%

“…[3]). However, evaluation of cluster models is a difficult task because of the unsupervised nature of clustering [4].…”

Section: Introductionmentioning

confidence: 99%

Internal Evaluation Measures as Proxies for External Indices in Clustering Gene Expression Data

Vukičević

Delibašić

Jovanović

et al. 2011

2011 IEEE International Conference on Bioinformatics and Biomedicine

View full text Add to dashboard Cite

Abstract-Several external indices that use information not present in the dataset were shown to be useful for evaluation of representative based clustering algorithms. However, such supervised measures are not directly useful for construction of better clustering algorithms when class labels are not provided. We propose a method for identifying internal cluster evaluation measures that use only information present in the dataset and are related to given external indices. We utilize these internal measures for the construction of representative based clustering algorithms. Both identification and utilization steps of the proposed method are enabled by use of a component-based clustering algorithm design. Experiments on 432 algorithms using gene expression data sets provide evidence that some internal measures could be used as surrogates for external indices proposed in the literature. Moreover, the obtained results suggest that internal measures correlated to selected external indices can guide the algorithms toward significantly better cluster models.

show abstract

Towards a standard methodology to evaluate internal cluster validity indices

Cited by 57 publications

References 28 publications

Promoting consensus in the concept mapping methodology: An application in the hospitality sector

Promoting consensus in the concept mapping methodology: An application in the hospitality sector

Cluster validation in clustering‐based one‐class classification

Internal Evaluation Measures as Proxies for External Indices in Clustering Gene Expression Data

Contact Info

Product

Resources

About