J. IntroductionClustering is usually formulated as an optimization problem by defining a cost function to be minimized. Most of these cost functions are nonconvex and have several local minima Traditional techniques [1,2,3,4] are essentially decent algorithms, as the cost is reduced at each iteration. Therefore, they tend to get trapped in local minima.Simulated annealing, or stochastic relaxation [5], is a known technique for avoiding local minima of nonconvex optimization problems. However, the process to escape local minima requires very slow schedules [6] which are not realistic for many practical applications.In a previous work [7,8] we proposed the concept of deterministic annealing for the problem of clustering and vector quantization. Our approach is strongly motivated by the physical analogy and baeed on principals of statistical mechanics or information theory [9,10]. In this paper we extend our clustering method to the constraint clustering method. Sections 2 and 3 are a brief presentation of our clustering approach. Section 4 presents the "constraint clustering approach", and sections 5 and 6 are two examples for which this approach can be applied.
Clustering by Deterministic AnnealingIn previous works [7,8] we have suggested a deterministic annealing approach to clustering and vector quantization. This approach is briefly summarized in this section and the following one.The method uses a fuzzy formulation where each data point z is associated in probability with each cluster Cj. The cluster C, is represented by its "cluster centroid" yj. The energy (cost or distortion) of associating the data point z to the cluster C, is d ( z , y , ) . If the set Y = {yj} of the clusters representatives is given, the expected energy of the system is where P(z E Cj) is the probability that the data point z belongs to the cluster Cj. Since we do nof have any prior knowledge about the data probability distribution function, we apply the principle of maximum entropy.As is well known, the probability distribution which maximizes the entropy under the expectation constraint (1) is e-Dd(z,Vj)where Zx is the partition function Z, = cj e-pd(zJ'J).The Lagrange multiplier p is determined by a given value of E in (l), and is inversely proportional to the "temperature". For a given set of cluster representatives {y,}, it is assumed that the probabilities relating different data point to their clusters are independent. Hence the total partition function is Z = n, Z,, and the free energy F is given by F = --logz(Y) = -C l o g Ce-@d(Z,Yj)