So Hirai scite author profile

We are concerned with the issue of detecting changes of clustering structures from multivariate time series. From the viewpoint of the minimum description length (MDL) principle, we propose an algorithm that tracks changes of clustering structures so that the sum of the code-length for data and that for clustering changes is minimum. Here we employ a Gaussian mixture model (GMM) as representation of clustering, and compute the code-length for data sequences using the normalized maximum likelihood (NML) coding. The proposed algorithm enables us to deal with clustering dynamics including merging, splitting, emergence, disappearance of clusters from a unifying view of the MDL principle. We empirically demonstrate using artificial data sets that our proposed method is able to detect cluster changes significantly more accurately than an existing statistical-test based method and AIC/BIC-based methods. We further use real customers' transaction data sets to demonstrate the validity of our algorithm in market analysis. We show that it is able to detect changes of customer groups, which correspond to changes of real market environments.

show abstract

Detecting Latent Structure Uncertainty with Structural Entropy

Hirai

Yamanishi

2018

View full text Add to dashboard Cite

Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering

Hirai

Yamanishi

2011

View full text Add to dashboard Cite

This paper addresses the issue of estimating from a given data sequence the number of mixture components for a Gaussian mixture model. Our approach is to compute the normalized maximum likelihood (NML) code-length for the data sequence relative to a Gaussian mixture model, then to find the mixture size that attains the minimum of the NML. Here the minimization of the NML code-length is known as Rissanen's minimum description length (MDL) principle. For discrete domains, Kontkanen and Myllymäki proposed a method of efficient computation of the NML code-length for specific models, however, for continuous domains it has remained open how we compute the NML codelength efficiently. We propose a method for efficient computation of the NML code-length for Gaussian mixture models. We develop it using an approximation of the NML code-length under the restriction of the domain and using the technique of a generating function. We apply it to the issue of determining the optimal number of clusters in clustering using a Gaussian mixture model, where the mixture size is the number of clusters. We use artificial data sets and benchmark data sets to empirically demonstrate that our estimate of the mixture size converges to the true one significantly faster than AIC and BIC.

show abstract

Correction to Efficient Computation of Normalized Maximum Likelihood Codes for Gaussian Mixture Models With Its Applications to Clustering [Nov 13 7718-7727]

Hirai

Yamanishi

2019

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

So Hirai

Efficient Computation of Normalized Maximum Likelihood Codes for Gaussian Mixture Models With Its Applications to Clustering

Detecting changes of clustering structures using normalized maximum likelihood coding

Detecting Latent Structure Uncertainty with Structural Entropy

Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering

Correction to Efficient Computation of Normalized Maximum Likelihood Codes for Gaussian Mixture Models With Its Applications to Clustering [Nov 13 7718-7727]

Contact Info

Product

Resources

About