An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program in 0-1 variables. The resolution method combines an interior point algorithm, i.e., a weighted analytic center column generation method, with branch-and-bound. The auxiliary problem of determining the entering column (i.e., the oracle) is an unconstrained hyperbolic program in 0-1 variables with a quadratic numerator and linear denominator. It is solved through a sequence of unconstrained quadratic programs in 0-1 variables. To accelerate resolution, variable neighborhood search heuristics are used both to get a good initial solution and to solve quickly the auxiliary problem as long as global optimality is not reached. Estimated bounds for the dual variables are deduced from the heuristic solution and used in the resolution process as a trust region. Proved minimum sum-of-squares partitions are determined for the first time for several fairly large data sets from the literature, including Fisher's 150 iris.
Introduction.Cluster analysis addresses the following general problem: Given a set of entities, find subsets, or clusters, which are homogeneous and/or well separated (Hartigan [25], Gordon [15], Kaufman and Rousseeuw [28], Mirkin [36]). This problem has many applications in engineering, medicine, and both the natural and the social sciences. The concepts of homogeneity and separation can be made precise in many ways. Moreover, a priori constraints, or in other words a structure, can be imposed on the clusters. This leads to many clustering problems and even more algorithms.The most studied and used methods of cluster analysis belong to two categories: hierarchical clustering and partitioning. Hierarchical clustering algorithms give a hierarchy of partitions, which are jointly composed of clusters either disjoint or included one into the other. Those algorithms are agglomerative or, less often, divisive. In the first case, they proceed from an initial partition, in which each cluster contains a single entity, by successive merging of pairs of clusters until all entities are in the same one. In the second case, they proceed from an initial partition with all entities in the same cluster, by successive bipartitions of one cluster at a time until all entities are isolated, one in each cluster. The best partition is then chosen from the hierarchy of partitions obtained, usually in an informal way. A graphical representation of results,