A numerical framework for clustering of time series via a Markov chain Monte Carlo (MCMC) method is presented. It combines concepts from recently introduced variational time series analysis and regularized clustering functional minimization [I. Horenko, SIAM J. Sci. Comput., 32 (2010), pp. 62-83] with MCMC. A conceptual advantage of the presented combined framework is that it allows us to address the two main problems of the existent clustering methods, e.g., the nonconvexity and the ill-posedness of the respective functionals, in a unified way. Clustering of the time series and minimization of the regularized clustering functional are based on the generation of samples from an appropriately chosen Boltzmann distribution in the space of cluster affiliation paths using simulated annealing and the Metropolis algorithm. The presented method is applied to sets of generic ill-posed clustering problems, and the results are compared to those obtained by the standard methods. As demonstrated in numerical examples, the presented MCMC formulation of the regularized clustering problem allows us to avoid the locality of the obtained minimizers, provides good clustering results even for very ill-posed problems with overlapping clusters, and is the computationally superior method for long time series.
Introduction.Cluster modeling is widely used in many application areas such as computational and statistical physics [42,15], climate/weather research [22,23,10,12,13,45,8], and finance [21,38,48]. In the context of time series analysis, the aim is usually to detect a hidden process switching between different regimes of a system's behavior, which helps to predict a certain outcome of future events. In most cases the only given information is observation data, which we can regard as a time series. Then the determination of the model and the data-based description of the regime behavior can be formulated as an optimization problem [3,16]. The main issue thereby is to compute a hidden path, weighting the influence of the data on the various possible cluster models and, therefore, specifying the transitions between the regimes.This can be rather difficult since (i) the underlying problem is ill-posed, due to the high number of unknowns in relation to the known parameters, and (ii) the results obtained with a local minimization algorithm depend on the initial parameters, since