2016
DOI: 10.3390/mca21030034
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions

Abstract: Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropriate covariance structure in the mixture cluster analysis. In this study, the efficiency of information criteria that a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
21
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(23 citation statements)
references
References 19 publications
0
21
0
Order By: Relevance
“…The complexity reflects the fact that for each of the K groups, ( P − J ) independent allele frequencies are estimated, so that the total number of free parameters of the model is ( K ( P − J )). We also implemented the variant of the AIC for small sample sizes, defined as (Akogul & Erisoglu, ):AICnormalc=2L+2false(K(PJ)Nfalse)/false(NKP+KJ1false)false)false)A popular alternative to AIC and AICc is the BIC (Schwarz, ), which also relies on a penalised deviance, albeit putting a stronger cost on complexity:BIC=2L+lnfalse(Nfalse)false(K(PJ)false)Finally, we also implemented the Kullback Information Criterion (KIC, Cavanaugh, ), which gave the best overall results for detecting the number of clusters from mixtures of multivariate normal distributions (Akogul & Erisoglu, ):KIC=2L+3false(K(PJ)+1false)All these statistics have similar behaviours in that the lower values typically indicate better fits. In practice, a sharp decrease in the statistics values with increasing numbers of clusters is most likely to reveal the optimal numbers of clusters (Jombart et al., ).…”
Section: Methodsmentioning
confidence: 99%
“…The complexity reflects the fact that for each of the K groups, ( P − J ) independent allele frequencies are estimated, so that the total number of free parameters of the model is ( K ( P − J )). We also implemented the variant of the AIC for small sample sizes, defined as (Akogul & Erisoglu, ):AICnormalc=2L+2false(K(PJ)Nfalse)/false(NKP+KJ1false)false)false)A popular alternative to AIC and AICc is the BIC (Schwarz, ), which also relies on a penalised deviance, albeit putting a stronger cost on complexity:BIC=2L+lnfalse(Nfalse)false(K(PJ)false)Finally, we also implemented the Kullback Information Criterion (KIC, Cavanaugh, ), which gave the best overall results for detecting the number of clusters from mixtures of multivariate normal distributions (Akogul & Erisoglu, ):KIC=2L+3false(K(PJ)+1false)All these statistics have similar behaviours in that the lower values typically indicate better fits. In practice, a sharp decrease in the statistics values with increasing numbers of clusters is most likely to reveal the optimal numbers of clusters (Jombart et al., ).…”
Section: Methodsmentioning
confidence: 99%
“…The alternative having the highest C-RIV value is the optimal number of clusters for the dataset. To form the pairwise comparison matrix of the criteria, the study of Akogul and Erisoglu [53] has been used. In their study [53], the efficiency of the information criteria was examined.…”
mentioning
confidence: 99%
“…To form the pairwise comparison matrix of the criteria, the study of Akogul and Erisoglu [53] has been used. In their study [53], the efficiency of the information criteria was examined. They also analyzed real datasets that are commonly used in clustering analysis.…”
mentioning
confidence: 99%
See 2 more Smart Citations