Probabilistic distance clustering is an iterative method for probabilistic clustering of data. Given clusters, their
centers
, and the
distances
of data points from these centers, the
probability
of cluster membership at any point is assumed to be inversely proportional to the distance from (the center of) the cluster in question. This assumption is the working
principle
.
The method is a generalization, to several centers, of the
Weiszfeld method
for solving the Fermat–Weber location problem. At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations stop when the centers stop moving.
Progress is monitored by the
joint distance function
, a measure of distance from all cluster centers, which evolves during the iterations and captures the data in its low contours.
The method is simple, fast (requiring a small number of cheap iterations), and insensitive to outliers.